Re: [ceph-users] Cephfs write fail when node goes down
On Mon, May 14, 2018 at 5:37 PM, Josef Zelenkawrote: > Hi everyone, we've encountered an unusual thing in our setup(4 nodes, 48 > OSDs, 3 monitors - ceph Jewel, Ubuntu 16.04 with kernel 4.4.0). Yesterday, > we were doing a HW upgrade of the nodes, so they went down one by one - the > cluster was in good shape during the upgrade, as we've done this numerous > times and we're quite sure that the redundancy wasn't screwed up while doing > this. However, during this upgrade one of the clients that does backups to > cephfs(mounted via the kernel driver) failed to write the backup file > correctly to the cluster with the following trace after we turned off one of > the nodes: > > [2585732.529412] 8800baa279a8 813fb2df 880236230e00 > 8802339c > [2585732.529414] 8800baa28000 88023fc96e00 7fff > 8800baa27b20 > [2585732.529415] 81840ed0 8800baa279c0 818406d5 > > [2585732.529417] Call Trace: > [2585732.529505] [] ? cpumask_next_and+0x2f/0x40 > [2585732.529558] [] ? bit_wait+0x60/0x60 > [2585732.529560] [] schedule+0x35/0x80 > [2585732.529562] [] schedule_timeout+0x1b5/0x270 > [2585732.529607] [] ? kvm_clock_get_cycles+0x1e/0x20 > [2585732.529609] [] ? bit_wait+0x60/0x60 > [2585732.529611] [] io_schedule_timeout+0xa4/0x110 > [2585732.529613] [] bit_wait_io+0x1b/0x70 > [2585732.529614] [] __wait_on_bit_lock+0x4e/0xb0 > [2585732.529652] [] __lock_page+0xbb/0xe0 > [2585732.529674] [] ? autoremove_wake_function+0x40/0x40 > [2585732.529676] [] pagecache_get_page+0x17d/0x1c0 > [2585732.529730] [] ? ceph_pool_perm_check+0x48/0x700 > [ceph] > [2585732.529732] [] grab_cache_page_write_begin+0x26/0x40 > [2585732.529738] [] ceph_write_begin+0x48/0xe0 [ceph] > [2585732.529739] [] generic_perform_write+0xce/0x1c0 > [2585732.529763] [] ? file_update_time+0xc9/0x110 > [2585732.529769] [] ceph_write_iter+0xf89/0x1040 [ceph] > [2585732.529792] [] ? __alloc_pages_nodemask+0x159/0x2a0 > [2585732.529808] [] new_sync_write+0x9b/0xe0 > [2585732.529811] [] __vfs_write+0x26/0x40 > [2585732.529812] [] vfs_write+0xa9/0x1a0 > [2585732.529814] [] SyS_write+0x55/0xc0 > [2585732.529817] [] entry_SYSCALL_64_fastpath+0x16/0x71 > > is there any hang osd request in /sys/kernel/debug/ceph//osdc? > I have encountered this behavior on Luminous, but not on Jewel. Anyone who > has a clue why the write fails? As far as i'm concerned, it should always > work if all the PGs are available. Thanks > Josef > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs write fail when node goes down
Which kernel version are you using? If it's an older kernel: consider using the cephfs-fuse client instead Paul 2018-05-14 11:37 GMT+02:00 Josef Zelenka: > Hi everyone, we've encountered an unusual thing in our setup(4 nodes, 48 > OSDs, 3 monitors - ceph Jewel, Ubuntu 16.04 with kernel 4.4.0). Yesterday, > we were doing a HW upgrade of the nodes, so they went down one by one - the > cluster was in good shape during the upgrade, as we've done this numerous > times and we're quite sure that the redundancy wasn't screwed up while > doing this. However, during this upgrade one of the clients that does > backups to cephfs(mounted via the kernel driver) failed to write the backup > file correctly to the cluster with the following trace after we turned off > one of the nodes: > > [2585732.529412] 8800baa279a8 813fb2df 880236230e00 > 8802339c > [2585732.529414] 8800baa28000 88023fc96e00 7fff > 8800baa27b20 > [2585732.529415] 81840ed0 8800baa279c0 818406d5 > > [2585732.529417] Call Trace: > [2585732.529505] [] ? cpumask_next_and+0x2f/0x40 > [2585732.529558] [] ? bit_wait+0x60/0x60 > [2585732.529560] [] schedule+0x35/0x80 > [2585732.529562] [] schedule_timeout+0x1b5/0x270 > [2585732.529607] [] ? kvm_clock_get_cycles+0x1e/0x20 > [2585732.529609] [] ? bit_wait+0x60/0x60 > [2585732.529611] [] io_schedule_timeout+0xa4/0x110 > [2585732.529613] [] bit_wait_io+0x1b/0x70 > [2585732.529614] [] __wait_on_bit_lock+0x4e/0xb0 > [2585732.529652] [] __lock_page+0xbb/0xe0 > [2585732.529674] [] ? autoremove_wake_function+0x40/ > 0x40 > [2585732.529676] [] pagecache_get_page+0x17d/0x1c0 > [2585732.529730] [] ? ceph_pool_perm_check+0x48/0x700 > [ceph] > [2585732.529732] [] grab_cache_page_write_begin+0x > 26/0x40 > [2585732.529738] [] ceph_write_begin+0x48/0xe0 [ceph] > [2585732.529739] [] generic_perform_write+0xce/0x1c0 > [2585732.529763] [] ? file_update_time+0xc9/0x110 > [2585732.529769] [] ceph_write_iter+0xf89/0x1040 [ceph] > [2585732.529792] [] ? __alloc_pages_nodemask+0x159/0 > x2a0 > [2585732.529808] [] new_sync_write+0x9b/0xe0 > [2585732.529811] [] __vfs_write+0x26/0x40 > [2585732.529812] [] vfs_write+0xa9/0x1a0 > [2585732.529814] [] SyS_write+0x55/0xc0 > [2585732.529817] [] entry_SYSCALL_64_fastpath+0x16/0x71 > > > I have encountered this behavior on Luminous, but not on Jewel. Anyone who > has a clue why the write fails? As far as i'm concerned, it should always > work if all the PGs are available. Thanks > Josef > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] nfs-ganesha 2.6 deb packages
I see that luminous RPM packages are up at download.ceph.com for ganesha-ceph 2.6 but there is nothing in the Deb area. Any estimates on when we might see those packages? http://download.ceph.com/nfs-ganesha/deb-V2.6-stable/luminous/ thanks, Ben ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] a big cluster or several small
Hi, don't do multiple clusters on the same server without containers; support for the cluster name stuff is deprecated and will probably be removed: https://github.com/ceph/ceph-deploy/pull/441 Also, I wouldn't split your cluster (yet?), ~300 OSDs is still quite small. But it depends on the exact circumstances... Paul 2018-05-14 18:49 GMT+02:00 Marc Boisis: > > Hi, > > Hello, > Currently we have a 294 OSD (21 hosts/3 racks) cluster with RBD clients > only, 1 single pool (size=3). > > We want to divide this cluster into several to minimize the risk in case > of failure/crash. > For example, a cluster for the mail, another for the file servers, a test > cluster ... > Do you think it's a good idea ? > > Do you have experience feedback on multiple clusters in production on the > same hardware: > - containers (LXD or Docker) > - multiple cluster on the same host without virtualization (with > ceph-deploy ... --cluster ...) > - multilple pools > ... > > Do you have any advice? > > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] a big cluster or several small
Hello Marc, In my beliefs that's exactly the main reason why people use Ceph: its gets more reliable the more nodes we put in the cluster. You should take a look in documentation and try to make use of placement rules, erasure codes or whatever fits your needs. I'm yet new in Ceph (been using for about 1 year) and I strongly tell you that your ideia just *may be* good, but may be a little overkill too =D Regards, On Mon, May 14, 2018 at 2:26 PM Michael Kurigerwrote: > The more servers you have in your cluster, the less impact a failure > causes to the cluster. Monitor your systems and keep them up to date. You > can also isolate data with clever crush rules and creating multiple zones. > > > > *Mike Kuriger* > > > > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf > Of *Marc Boisis > *Sent:* Monday, May 14, 2018 9:50 AM > *To:* ceph-users > *Subject:* [ceph-users] a big cluster or several small > > > > > Hi, > > > > Hello, > > Currently we have a 294 OSD (21 hosts/3 racks) cluster with RBD clients > only, 1 single pool (size=3). > > > > We want to divide this cluster into several to minimize the risk in case > of failure/crash. > > For example, a cluster for the mail, another for the file servers, a test > cluster ... > > Do you think it's a good idea ? > > > > Do you have experience feedback on multiple clusters in production on the > same hardware: > > - containers (LXD or Docker) > > - multiple cluster on the same host without virtualization (with > ceph-deploy ... --cluster ...) > > - multilple pools > > ... > > > > Do you have any advice? > > > > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- João Paulo Bastos DevOps Engineer at Mav Tecnologia Belo Horizonte - Brazil +55 31 99279-7092 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] a big cluster or several small
The more servers you have in your cluster, the less impact a failure causes to the cluster. Monitor your systems and keep them up to date. You can also isolate data with clever crush rules and creating multiple zones. Mike Kuriger From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Marc Boisis Sent: Monday, May 14, 2018 9:50 AM To: ceph-users Subject: [ceph-users] a big cluster or several small Hi, Hello, Currently we have a 294 OSD (21 hosts/3 racks) cluster with RBD clients only, 1 single pool (size=3). We want to divide this cluster into several to minimize the risk in case of failure/crash. For example, a cluster for the mail, another for the file servers, a test cluster ... Do you think it's a good idea ? Do you have experience feedback on multiple clusters in production on the same hardware: - containers (LXD or Docker) - multiple cluster on the same host without virtualization (with ceph-deploy ... --cluster ...) - multilple pools ... Do you have any advice? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] a big cluster or several small
Well I currently manage 27 nodes, over 9 clusters There is some burden that you should considers The easiest is : "what do we do when two smalls clusters, which grows slowly, need more space" With one cluster: buy a node, add it, done With two clusters: buy two nodes, add them, done This can be an issue; If you can move the data between clusters transparently and painlessly, then it's OK : most of our data is used via Proxmox clusters, which allow us to move from one Ceph cluster to an other, so we can "rebalance" the whole stuff However, we also have some Cephfs stuff, and this is not the same deal: moving part of a cephfs between clusters in a pita (youhou, rsync & friends) Considering all of this, splitting your cluster may be a sane idea, or may be not, I however recommend against over-splitting : it does not worth it On 05/14/2018 06:49 PM, Marc Boisis wrote: > > Hi, > > Hello, > Currently we have a 294 OSD (21 hosts/3 racks) cluster with RBD clients only, > 1 single pool (size=3). > > We want to divide this cluster into several to minimize the risk in case of > failure/crash. > For example, a cluster for the mail, another for the file servers, a test > cluster ... > Do you think it's a good idea ? > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] a big cluster or several small
Hi, Hello, Currently we have a 294 OSD (21 hosts/3 racks) cluster with RBD clients only, 1 single pool (size=3). We want to divide this cluster into several to minimize the risk in case of failure/crash. For example, a cluster for the mail, another for the file servers, a test cluster ... Do you think it's a good idea ? Do you have experience feedback on multiple clusters in production on the same hardware: - containers (LXD or Docker) - multiple cluster on the same host without virtualization (with ceph-deploy ... --cluster ...) - multilple pools ... Do you have any advice? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] slow requests are blocked
Hello David! 2. I set it up 10/10 3. Thanks, my problem was I did it on host where was no osd.15 daemon. Could you please help to read osd logs? Here is a part from ceph.log 2018-05-14 13:46:32.644323 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 553895 : cluster [INF] Cluster is now healthy 2018-05-14 13:46:43.741921 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 553896 : cluster [WRN] Health check failed: 21 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-05-14 13:46:49.746994 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 553897 : cluster [WRN] Health check update: 23 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-05-14 13:46:55.752314 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 553900 : cluster [WRN] Health check update: 3 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-05-14 13:47:01.030686 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 553901 : cluster [WRN] Health check update: 4 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-05-14 13:47:07.764236 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 553903 : cluster [WRN] Health check update: 32 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-05-14 13:47:13.770833 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 553904 : cluster [WRN] Health check update: 21 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-05-14 13:47:17.774530 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 553905 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 12 slow requests are blocked > 32 sec) 2018-05-14 13:47:17.774582 mon.storage-ru1-osd1 mon.0 185.164.149.2:6789/0 553906 : cluster [INF] Cluster is now healthy At 13-47 I had a problem with osd.21 1. Ceph Health (storage-ru1-osd1.voximplant.com:ceph.health): HEALTH_WARN {u'REQUEST_SLOW': {u'severity': u'HEALTH_WARN', u'summary': {u'message': u'4 slow requests are blocked > 32 sec'}}} HEALTH_WARN 4 slow requests are blocked > 32 sec REQUEST_SLOW 4 slow requests are blocked > 32 sec 2 ops are blocked > 65.536 sec 2 ops are blocked > 32.768 sec osd.21 has blocked requests > 65.536 sec Here is a part from ceph-osd.21.log 2018-05-14 13:47:06.891399 7fb806dd6700 10 osd.21 pg_epoch: 236 pg[2.0( v 236'297 (0'0,236'297] local-lis/les=223/224 n=1 ec=119/119 lis/c 223/223 les/c/f 224/224/0 223/223/212) [21,29,15] r=0 lpr=223 crt=236'297 lcod 236'296 mlcod 236'296 active+clean] dropping ondisk_read_lock 2018-05-14 13:47:06.891435 7fb806dd6700 10 osd.21 236 dequeue_op 0x56453b753f80 finish 2018-05-14 13:47:07.111388 7fb8185f9700 10 osd.21 236 tick 2018-05-14 13:47:07.111398 7fb8185f9700 10 osd.21 236 do_waiters -- start 2018-05-14 13:47:07.111401 7fb8185f9700 10 osd.21 236 do_waiters -- finish 2018-05-14 13:47:07.800421 7fb817df8700 10 osd.21 236 tick_without_osd_lock 2018-05-14 13:47:07.800444 7fb817df8700 10 osd.21 236 promote_throttle_recalibrate 0 attempts, promoted 0 objects and 0 bytes; target 25 obj/sec or 5120 k bytes/sec 2018-05-14 13:47:07.800449 7fb817df8700 10 osd.21 236 promote_throttle_recalibrate actual 0, actual/prob ratio 1, adjusted new_prob 1000, prob 1000 -> 1000 2018-05-14 13:47:08.111470 7fb8185f9700 10 osd.21 236 tick 2018-05-14 13:47:08.111483 7fb8185f9700 10 osd.21 236 do_waiters -- start 2018-05-14 13:47:08.111485 7fb8185f9700 10 osd.21 236 do_waiters -- finish 2018-05-14 13:47:08.181070 7fb8055d3700 10 osd.21 236 dequeue_op 0x564539651000 prio 63 cost 0 latency 0.000143 osd_op(client.2597258.0:213844298 6.1d4 6.4079fd4 (undecoded) ondisk+read+kno wn_if_redirected e236) v8 pg pg[6.1d4( v 236'20882 (236'19289,236'20882] local-lis/les=223/224 n=20791 ec=145/132 lis/c 223/223 les/c/f 224/224/0 223/223/212) [21,29,17] r=0 lpr=223 crt=236 '20882 lcod 236'20881 mlcod 236'20881 active+clean] 2018-05-14 13:47:08.181112 7fb8055d3700 10 osd.21 pg_epoch: 236 pg[6.1d4( v 236'20882 (236'19289,236'20882] local-lis/les=223/224 n=20791 ec=145/132 lis/c 223/223 les/c/f 224/224/0 223/223/ 212) [21,29,17] r=0 lpr=223 crt=236'20882 lcod 236'20881 mlcod 236'20881 active+clean] _handle_message: 0x564539651000 2018-05-14 13:47:08.181141 7fb8055d3700 10 osd.21 pg_epoch: 236 pg[6.1d4( v 236'20882 (236'19289,236'20882] local-lis/les=223/224 n=20791 ec=145/132 lis/c 223/223 les/c/f 224/224/0 223/223/ 212) [21,29,17] r=0 lpr=223 crt=236'20882 lcod 236'20881 mlcod 236'20881 active+clean] do_op osd_op(client.2597258.0:213844298 6.1d4 6:2bf9e020:::eb359f44-3316-4cd3-9006-d416c21e0745.120446 4.6_2018%2f05%2f14%2fYWRjNmZmNzQzODI2ZGQzOTc3ZjFiNGMxZjIxOGZlYzQvaHR0cDovL3d3dy1sdS0wMS0zNi52b3hpbXBsYW50LmNvbS9yZWNvcmRzLzIwMTgvMDUvMTQvOTRlNjYxY2JiZjU3MTk4NS4xNTI2MjkwMzQ0Ljk2NjQ5MS5tcDM- :head [getxattrs,stat,read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected e236) v8 may_read -> read-ordered flags ondisk+read+known_if_redirected 2018-05-14 13:47:08.181179 7fb8055d3700 10 osd.21 pg_epoch: 236 pg[6.1d4( v 236'20882 (236'19289,236'20882] local-lis/les=223/224 n=20791 ec=145/132 lis/c 223/223 les/c/f 224/224/0 223/223/ 212)
Re: [ceph-users] PG show inconsistent active+clean+inconsistent
Just for clarification, the PG state is not the cause of the scrub errors. Something happened in your cluster that caused inconsistencies between copies of the data, the scrub noticed them, the scrub errors are why the PG is flagged inconsistent, which does put the cluster in HEALTH_ERR. Anyway, just semantics from your original assessment of the situation. Disabling scrubs is a bad idea here. While you have a lot of scrub errors, you only know of 1 PG that has those errors. You may have multiple PGs with the same problem. Perhaps a single disk is having problems and every PG on that disk has scrub errors. There are a lot of other scenarios that could be happening as well. I would start by issuing `ceph osd scrub $osd` to scrub all PGs on the currently known OSDs used by this PG. If that doesn't find anything, then try `ceph osd deep-scrub $osd`. Those commands are a shortcut to schedule a scrub/deep-scrub for every PG that is primary on the given OSD. If you don't find any more scrub errors, then you may need to check the rest of the PGs in your cluster, definitely the ones inside of the same pool #2 along with the currently inconsistent PG. Now, while that's diagnosing and getting us more information... what happened to your cluster? Anything where OSDs were flapping up and down? You added new storage? Lost a drive? Upgraded versions? What is your version? What has happened in the past few weeks in your cluster? Likely, the fix is going to start with issuing a repair of your PG. I like to diagnose the full scope of the problem before trying to repair things. Also, if I can't figure out what's going on, I try to backup the PG copies I'm repairing before doing so just in case something doesn't repair properly. On Sat, May 12, 2018 at 2:38 AM Faizal Latifwrote: > Hi Guys, > > i need some help. i can see currently my ceph storage showing " > *active+clean+inconsistent*". which result HEALTH_ERR state and cause > scrubbing error. you may find below are sample output. > > HEALTH_ERR 1 pgs inconsistent; 11685 scrub errors; noscrub,nodeep-scrub > flag(s) set > pg 2.2c0 is active+clean+inconsistent, acting [28,17,37] > 11685 scrub errors > noscrub,nodeep-scrub flag(s) set > > i have disable scrubbing since i can see there are scrub errors. i have > also try to use rados command to see object status. and below are the > results. > > rados list-inconsistent-obj 2.2c0 --format=json-pretty > { > "epoch": 57580, > "inconsistents": [ > { > "object": { > "name": "rbd_data.10815ea2ae8944a.0385", > "nspace": "", > "locator": "", > "snap": 55, > "version": 0 > }, > "errors": [], > "union_shard_errors": [ > "missing", > "*oi_attr_missing*" > ], > "shards": [ > { > "osd": 10, > "errors": [ > "*oi_attr_missing*" > ], > "size": 4194304, > "omap_digest": "0x", > "data_digest": "0x32133b39" > }, > { > "osd": 28, > "errors": [ > "missing" > ] > }, > { > "osd": 37, > "errors": [ > "missing" > ] > } > ] > }, > { > "object": { > "name": "rbd_data.10815ea2ae8944a.0730", > "nspace": "", > "locator": "", > "snap": 55, > "version": 0 > }, > "errors": [], > "union_shard_errors": [ > "missing", > "*oi_attr_missing*" > ], > "shards": [ > { > "osd": 10, > "errors": [ > "*oi_attr_missing*" > ], > "size": 4194304, > "omap_digest": "0x", > "data_digest": "0x0f843f64" > }, > { > "osd": 28, > "errors": [ > "missing" > ] > }, > { > "osd": 37, > "errors": [ > "missing" > ] > } > ] > }, > > i can see most of the objects show *oi_attr_missing. *is there anyway > that i can solved this? i believe this is the reason why scrubbing keep > failing to this pg group. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs
Hi Wido, Are you trying this setting? /sys/devices/system/cpu/intel_pstate/min_perf_pct -Original Message- From: ceph-usersOn Behalf Of Wido den Hollander Sent: 14 May 2018 14:14 To: n...@fisk.me.uk; 'Blair Bethwaite' Cc: 'ceph-users' Subject: Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs On 05/01/2018 10:19 PM, Nick Fisk wrote: > 4.16 required? > https://www.phoronix.com/scan.php?page=news_item=Skylake-X-P-State- > Linux- > 4.16 > I've been trying with the 4.16 kernel for the last few days, but still, it's not working. The CPU's keep clocking down to 800Mhz I've set scaling_min_freq=scaling_max_freq in /sys, but that doesn't change a thing. The CPUs keep scaling down. Still not close to the 1ms latency with these CPUs :( Wido > > -Original Message- > From: ceph-users On Behalf Of > Blair Bethwaite > Sent: 01 May 2018 16:46 > To: Wido den Hollander > Cc: ceph-users ; Nick Fisk > > Subject: Re: [ceph-users] Intel Xeon Scalable and CPU frequency > scaling on NVMe/SSD Ceph OSDs > > Also curious about this over here. We've got a rack's worth of R740XDs > with Xeon 4114's running RHEL 7.4 and intel-pstate isn't even active > on them, though I don't believe they are any different at the OS level > to our Broadwell nodes (where it is loaded). > > Have you tried poking the kernel's pmqos interface for your use-case? > > On 2 May 2018 at 01:07, Wido den Hollander wrote: >> Hi, >> >> I've been trying to get the lowest latency possible out of the new >> Xeon Scalable CPUs and so far I got down to 1.3ms with the help of Nick. >> >> However, I can't seem to pin the CPUs to always run at their maximum >> frequency. >> >> If I disable power saving in the BIOS they stay at 2.1Ghz (Silver >> 4110), but that disables the boost. >> >> With the Power Saving enabled in the BIOS and when giving the OS all >> control for some reason the CPUs keep scaling down. >> >> $ echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct >> >> cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009 Report >> errors and bugs to cpuf...@vger.kernel.org, please. >> analyzing CPU 0: >> driver: intel_pstate >> CPUs which run at the same hardware frequency: 0 >> CPUs which need to have their frequency coordinated by software: 0 >> maximum transition latency: 0.97 ms. >> hardware limits: 800 MHz - 3.00 GHz >> available cpufreq governors: performance, powersave >> current policy: frequency should be within 800 MHz and 3.00 GHz. >> The governor "performance" may decide which speed to use >> within this range. >> current CPU frequency is 800 MHz. >> >> I do see the CPUs scale up to 2.1Ghz, but they quickly scale down >> again to 800Mhz and that hurts latency. (50% difference!) >> >> With the CPUs scaling down to 800Mhz my latency jumps from 1.3ms to >> 2.4ms on avg. With turbo enabled I hope to get down to 1.1~1.2ms on avg. >> >> $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor >> performance >> >> Everything seems to be OK and I would expect the CPUs to stay at >> 2.10Ghz, but they aren't. >> >> C-States are also pinned to 0 as a boot parameter for the kernel: >> >> processor.max_cstate=1 intel_idle.max_cstate=0 >> >> Running Ubuntu 16.04.4 with the 4.13 kernel from the HWE from Ubuntu. >> >> Has anybody tried this yet with the recent Intel Xeon Scalable CPUs? >> >> Thanks, >> >> Wido >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > Cheers, > ~Blairo > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs
Wido, I am going to put my rather large foot in it here. I am sure it is understood that the Turbo mode will not keep all cores at the maximum frequency at any given time. There is a thermal envelope for the chip, and the chip works to keep the power dissipation within that envelope. >From what I gather there is a range of thermal limits even within a given processor SKU, so every chip will exhibit different Turbo mode behaviour. And I am sure we all know that when AVX comes into use the Turbo limit is lower. I guess what I am saying that for to have reproducible behaviour, if you care about it for timings etc. Turbo can be switched off. Before you say it, in this case you want to achieve the minimum latency and reproducibility at the Mhz level is not important. Also worth saying that cooling is important with Turboboost comes into play. I heard a paper at an HPC Advisory Council where a Russian setup by Lenovo got significantly more performance at the HPC acceptance testing stage when cooling was turned up. I guess my rambling has not added much to this debate, sorry. cue a friendly Intel engineer to wander in and tell us exactly what is going on. On 14 May 2018 at 15:13, Wido den Hollanderwrote: > > > On 05/01/2018 10:19 PM, Nick Fisk wrote: > > 4.16 required? > > https://www.phoronix.com/scan.php?page=news_item=Skylake- > X-P-State-Linux- > > 4.16 > > > > I've been trying with the 4.16 kernel for the last few days, but still, > it's not working. > > The CPU's keep clocking down to 800Mhz > > I've set scaling_min_freq=scaling_max_freq in /sys, but that doesn't > change a thing. The CPUs keep scaling down. > > Still not close to the 1ms latency with these CPUs :( > > Wido > > > > > -Original Message- > > From: ceph-users On Behalf Of Blair > > Bethwaite > > Sent: 01 May 2018 16:46 > > To: Wido den Hollander > > Cc: ceph-users ; Nick Fisk > > Subject: Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling > on > > NVMe/SSD Ceph OSDs > > > > Also curious about this over here. We've got a rack's worth of R740XDs > with > > Xeon 4114's running RHEL 7.4 and intel-pstate isn't even active on them, > > though I don't believe they are any different at the OS level to our > > Broadwell nodes (where it is loaded). > > > > Have you tried poking the kernel's pmqos interface for your use-case? > > > > On 2 May 2018 at 01:07, Wido den Hollander wrote: > >> Hi, > >> > >> I've been trying to get the lowest latency possible out of the new > >> Xeon Scalable CPUs and so far I got down to 1.3ms with the help of Nick. > >> > >> However, I can't seem to pin the CPUs to always run at their maximum > >> frequency. > >> > >> If I disable power saving in the BIOS they stay at 2.1Ghz (Silver > >> 4110), but that disables the boost. > >> > >> With the Power Saving enabled in the BIOS and when giving the OS all > >> control for some reason the CPUs keep scaling down. > >> > >> $ echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct > >> > >> cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009 Report > >> errors and bugs to cpuf...@vger.kernel.org, please. > >> analyzing CPU 0: > >> driver: intel_pstate > >> CPUs which run at the same hardware frequency: 0 > >> CPUs which need to have their frequency coordinated by software: 0 > >> maximum transition latency: 0.97 ms. > >> hardware limits: 800 MHz - 3.00 GHz > >> available cpufreq governors: performance, powersave > >> current policy: frequency should be within 800 MHz and 3.00 GHz. > >> The governor "performance" may decide which speed to > use > >> within this range. > >> current CPU frequency is 800 MHz. > >> > >> I do see the CPUs scale up to 2.1Ghz, but they quickly scale down > >> again to 800Mhz and that hurts latency. (50% difference!) > >> > >> With the CPUs scaling down to 800Mhz my latency jumps from 1.3ms to > >> 2.4ms on avg. With turbo enabled I hope to get down to 1.1~1.2ms on avg. > >> > >> $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor > >> performance > >> > >> Everything seems to be OK and I would expect the CPUs to stay at > >> 2.10Ghz, but they aren't. > >> > >> C-States are also pinned to 0 as a boot parameter for the kernel: > >> > >> processor.max_cstate=1 intel_idle.max_cstate=0 > >> > >> Running Ubuntu 16.04.4 with the 4.13 kernel from the HWE from Ubuntu. > >> > >> Has anybody tried this yet with the recent Intel Xeon Scalable CPUs? > >> > >> Thanks, > >> > >> Wido > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > -- > > Cheers, > > ~Blairo > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > >
Re: [ceph-users] ceph mds memory usage 20GB : is it normal ?
On Sat, May 12, 2018 at 3:11 AM Alexandre DERUMIERwrote: > The documentation (luminous) say: > > >mds cache size > > > >Description:The number of inodes to cache. A value of 0 indicates an > unlimited number. It is recommended to use mds_cache_memory_limit to limit > the amount of memory the MDS cache uses. > >Type: 32-bit Integer > >Default:0 > > and, my mds_cache_memory_limit is currently at 5GB. yeah I have only suggested that because the high memory usage seemed to trouble you and it might be a bug, so it's more of a workaround. Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs
On 05/01/2018 10:19 PM, Nick Fisk wrote: > 4.16 required? > https://www.phoronix.com/scan.php?page=news_item=Skylake-X-P-State-Linux- > 4.16 > I've been trying with the 4.16 kernel for the last few days, but still, it's not working. The CPU's keep clocking down to 800Mhz I've set scaling_min_freq=scaling_max_freq in /sys, but that doesn't change a thing. The CPUs keep scaling down. Still not close to the 1ms latency with these CPUs :( Wido > > -Original Message- > From: ceph-usersOn Behalf Of Blair > Bethwaite > Sent: 01 May 2018 16:46 > To: Wido den Hollander > Cc: ceph-users ; Nick Fisk > Subject: Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on > NVMe/SSD Ceph OSDs > > Also curious about this over here. We've got a rack's worth of R740XDs with > Xeon 4114's running RHEL 7.4 and intel-pstate isn't even active on them, > though I don't believe they are any different at the OS level to our > Broadwell nodes (where it is loaded). > > Have you tried poking the kernel's pmqos interface for your use-case? > > On 2 May 2018 at 01:07, Wido den Hollander wrote: >> Hi, >> >> I've been trying to get the lowest latency possible out of the new >> Xeon Scalable CPUs and so far I got down to 1.3ms with the help of Nick. >> >> However, I can't seem to pin the CPUs to always run at their maximum >> frequency. >> >> If I disable power saving in the BIOS they stay at 2.1Ghz (Silver >> 4110), but that disables the boost. >> >> With the Power Saving enabled in the BIOS and when giving the OS all >> control for some reason the CPUs keep scaling down. >> >> $ echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct >> >> cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009 Report >> errors and bugs to cpuf...@vger.kernel.org, please. >> analyzing CPU 0: >> driver: intel_pstate >> CPUs which run at the same hardware frequency: 0 >> CPUs which need to have their frequency coordinated by software: 0 >> maximum transition latency: 0.97 ms. >> hardware limits: 800 MHz - 3.00 GHz >> available cpufreq governors: performance, powersave >> current policy: frequency should be within 800 MHz and 3.00 GHz. >> The governor "performance" may decide which speed to use >> within this range. >> current CPU frequency is 800 MHz. >> >> I do see the CPUs scale up to 2.1Ghz, but they quickly scale down >> again to 800Mhz and that hurts latency. (50% difference!) >> >> With the CPUs scaling down to 800Mhz my latency jumps from 1.3ms to >> 2.4ms on avg. With turbo enabled I hope to get down to 1.1~1.2ms on avg. >> >> $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor >> performance >> >> Everything seems to be OK and I would expect the CPUs to stay at >> 2.10Ghz, but they aren't. >> >> C-States are also pinned to 0 as a boot parameter for the kernel: >> >> processor.max_cstate=1 intel_idle.max_cstate=0 >> >> Running Ubuntu 16.04.4 with the 4.13 kernel from the HWE from Ubuntu. >> >> Has anybody tried this yet with the recent Intel Xeon Scalable CPUs? >> >> Thanks, >> >> Wido >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > Cheers, > ~Blairo > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD Cache and rbd-nbd
On Mon, May 14, 2018 at 12:15 AM, Marc Schöchlinwrote: > Hello Jason, > > many thanks for your informative response! > > Am 11.05.2018 um 17:02 schrieb Jason Dillaman: >> I cannot speak for Xen, but in general IO to a block device will hit >> the pagecache unless the IO operation is flagged as direct (e.g. >> O_DIRECT) to bypass the pagecache and directly send it to the block >> device. > Sure, but it seems that xenserver just forwards io from virtual machines > (vm: blkfront, dom-0: blkback) to the ndb device in dom-0. >>> Sorry, my question was a bit unprecice: I was searching for usage statistics >>> of the rbd cache. >>> Is there also a possibility to gather rbd_cache usage statistics as a source >>> of verification for optimizing the cache settings? >> You can run "perf dump" instead of "config show" to dump out the >> current performance counters. There are some stats from the in-memory >> cache included in there. > Great, i was not aware of that. > There are really a lot of statistics which might be useful for analyzing > whats going on or if the optimizations improve the performance of our > systems. >>> Can you provide some hints how to about adequate cache settings for a write >>> intensive environment (70% write, 30% read)? >>> Is it a good idea to specify a huge rbd cache of 1 GB with a max dirty age >>> of 10 seconds? >> Depends on your workload and your testing results. I suspect a >> database on top of RBD is going to do its own read caching and will be >> issuing lots of flush calls to the block device, potentially negating >> the need for a large cache. > > Sure, reducing flushes with the acceptance of a degraded level of > reliability seems to be one import key for improved performance. > >>> >>> Our typical workload is originated over 70 percent in database write >>> operations in the virtual machines. >>> Therefore collecting write operations with rbd cache and writing them in >>> chunks to ceph might be a good thing. >>> A higher limit for "rbd cache max dirty" might be a adequate here. >>> At the other side our read workload typically reads huge files in sequential >>> manner. >>> >>> Therefore it might be useful to do start with a configuration like that: >>> >>> rbd cache size = 64MB >>> rbd cache max dirty = 48MB >>> rbd cache target dirty = 32MB >>> rbd cache max dirty age = 10 >>> >>> What is the strategy of librbd to write data to the storage from rbd_cache >>> if "rbd cache max dirty = 48MB" is reached? >>> Is there a reduction of io operations (merging of ios) compared to the >>> granularity of writes of my virtual machines? >> If the cache is full, incoming IO will be stalled as the dirty bits >> are written back to the backing RBD image to make room available for >> the new IO request. > Sure, i will have a look at the statistics and the throughput. > Is there any consolidation of write requests in rbd cache? > > Example: > If a vm writes small io-requests to the ndb device with belong to the > same rados object - does librbd consollidate these requests to a single > ceph io? > What strategies does librd use for that? The librbd cache will consolidate sequential dirty extents within the same object, but it does not consolidate all dirty extents within the same object to the same write request. > Regards > Marc > -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Cephfs write fail when node goes down
Hi everyone, we've encountered an unusual thing in our setup(4 nodes, 48 OSDs, 3 monitors - ceph Jewel, Ubuntu 16.04 with kernel 4.4.0). Yesterday, we were doing a HW upgrade of the nodes, so they went down one by one - the cluster was in good shape during the upgrade, as we've done this numerous times and we're quite sure that the redundancy wasn't screwed up while doing this. However, during this upgrade one of the clients that does backups to cephfs(mounted via the kernel driver) failed to write the backup file correctly to the cluster with the following trace after we turned off one of the nodes: [2585732.529412] 8800baa279a8 813fb2df 880236230e00 8802339c [2585732.529414] 8800baa28000 88023fc96e00 7fff 8800baa27b20 [2585732.529415] 81840ed0 8800baa279c0 818406d5 [2585732.529417] Call Trace: [2585732.529505] [] ? cpumask_next_and+0x2f/0x40 [2585732.529558] [] ? bit_wait+0x60/0x60 [2585732.529560] [] schedule+0x35/0x80 [2585732.529562] [] schedule_timeout+0x1b5/0x270 [2585732.529607] [] ? kvm_clock_get_cycles+0x1e/0x20 [2585732.529609] [] ? bit_wait+0x60/0x60 [2585732.529611] [] io_schedule_timeout+0xa4/0x110 [2585732.529613] [] bit_wait_io+0x1b/0x70 [2585732.529614] [] __wait_on_bit_lock+0x4e/0xb0 [2585732.529652] [] __lock_page+0xbb/0xe0 [2585732.529674] [] ? autoremove_wake_function+0x40/0x40 [2585732.529676] [] pagecache_get_page+0x17d/0x1c0 [2585732.529730] [] ? ceph_pool_perm_check+0x48/0x700 [ceph] [2585732.529732] [] grab_cache_page_write_begin+0x26/0x40 [2585732.529738] [] ceph_write_begin+0x48/0xe0 [ceph] [2585732.529739] [] generic_perform_write+0xce/0x1c0 [2585732.529763] [] ? file_update_time+0xc9/0x110 [2585732.529769] [] ceph_write_iter+0xf89/0x1040 [ceph] [2585732.529792] [] ? __alloc_pages_nodemask+0x159/0x2a0 [2585732.529808] [] new_sync_write+0x9b/0xe0 [2585732.529811] [] __vfs_write+0x26/0x40 [2585732.529812] [] vfs_write+0xa9/0x1a0 [2585732.529814] [] SyS_write+0x55/0xc0 [2585732.529817] [] entry_SYSCALL_64_fastpath+0x16/0x71 I have encountered this behavior on Luminous, but not on Jewel. Anyone who has a clue why the write fails? As far as i'm concerned, it should always work if all the PGs are available. Thanks Josef ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD Cache and rbd-nbd
Hello Jason, many thanks for your informative response! Am 11.05.2018 um 17:02 schrieb Jason Dillaman: > I cannot speak for Xen, but in general IO to a block device will hit > the pagecache unless the IO operation is flagged as direct (e.g. > O_DIRECT) to bypass the pagecache and directly send it to the block > device. Sure, but it seems that xenserver just forwards io from virtual machines (vm: blkfront, dom-0: blkback) to the ndb device in dom-0. >> Sorry, my question was a bit unprecice: I was searching for usage statistics >> of the rbd cache. >> Is there also a possibility to gather rbd_cache usage statistics as a source >> of verification for optimizing the cache settings? > You can run "perf dump" instead of "config show" to dump out the > current performance counters. There are some stats from the in-memory > cache included in there. Great, i was not aware of that. There are really a lot of statistics which might be useful for analyzing whats going on or if the optimizations improve the performance of our systems. >> Can you provide some hints how to about adequate cache settings for a write >> intensive environment (70% write, 30% read)? >> Is it a good idea to specify a huge rbd cache of 1 GB with a max dirty age >> of 10 seconds? > Depends on your workload and your testing results. I suspect a > database on top of RBD is going to do its own read caching and will be > issuing lots of flush calls to the block device, potentially negating > the need for a large cache. Sure, reducing flushes with the acceptance of a degraded level of reliability seems to be one import key for improved performance. >> >> Our typical workload is originated over 70 percent in database write >> operations in the virtual machines. >> Therefore collecting write operations with rbd cache and writing them in >> chunks to ceph might be a good thing. >> A higher limit for "rbd cache max dirty" might be a adequate here. >> At the other side our read workload typically reads huge files in sequential >> manner. >> >> Therefore it might be useful to do start with a configuration like that: >> >> rbd cache size = 64MB >> rbd cache max dirty = 48MB >> rbd cache target dirty = 32MB >> rbd cache max dirty age = 10 >> >> What is the strategy of librbd to write data to the storage from rbd_cache >> if "rbd cache max dirty = 48MB" is reached? >> Is there a reduction of io operations (merging of ios) compared to the >> granularity of writes of my virtual machines? > If the cache is full, incoming IO will be stalled as the dirty bits > are written back to the backing RBD image to make room available for > the new IO request. Sure, i will have a look at the statistics and the throughput. Is there any consolidation of write requests in rbd cache? Example: If a vm writes small io-requests to the ndb device with belong to the same rados object - does librbd consollidate these requests to a single ceph io? What strategies does librd use for that? Regards Marc ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] jewel to luminous upgrade, chooseleaf_vary_r and chooseleaf_stable
Hi Adrian, Is there a strict reason why you *must* upgrade the tunables? It is normally OK to run with old (e.g. hammer) tunables on a luminous cluster. The crush placement won't be state of the art, but that's not a huge problem. We have a lot of data in a jewel cluster with hammer tunables. We'll upgrade that to luminous soon, but don't plan to set chooseleaf_stable until there's less disruptive procedure, e.g. [1]. Cheers, Dan [1] One idea I had to make this much less disruptive would be to script something that uses upmap's to lock all PGs into their current placement, then set chooseleaf_stable, then gradually remove the upmap's. There are some details to work out, and it requires all clients to be running luminous, but I think something like this could help... On Mon, May 14, 2018 at 9:01 AM, Adrianwrote: > Hi all, > > We recently upgraded our old ceph cluster to jewel (5xmon, 21xstorage hosts > with 9x6tb filestore osds and 3xssd's with 3 journals on each) - mostly used > for openstack compute/cinder. > > In order to get there we had to go with chooseleaf_vary_r = 4 in order to > minimize client impact and save time. We now need to get to luminous (on a > deadline and time is limited). > > Current tunables are: > { > "choose_local_tries": 0, > "choose_local_fallback_tries": 0, > "choose_total_tries": 50, > "chooseleaf_descend_once": 1, > "chooseleaf_vary_r": 4, > "chooseleaf_stable": 0, > "straw_calc_version": 1, > "allowed_bucket_algs": 22, > "profile": "unknown", > "optimal_tunables": 0, > "legacy_tunables": 0, > "minimum_required_version": "firefly", > "require_feature_tunables": 1, > "require_feature_tunables2": 1, > "has_v2_rules": 0, > "require_feature_tunables3": 1, > "has_v3_rules": 0, > "has_v4_buckets": 0, > "require_feature_tunables5": 0, > "has_v5_rules": 0 > } > > Setting chooseleaf_stable to 1, the crush compare tool says: >Replacing the crushmap specified with --origin with the crushmap > specified with --destination will move 8774 PGs (59.08417508417509% of the > total) > from one item to another. > > Current tunings we have in ceph.conf are: > #THROTTLING CEPH > osd_max_backfills = 1 > osd_recovery_max_active = 1 > osd_recovery_op_priority = 1 > osd_client_op_priority = 63 > > #PERFORMANCE TUNING > osd_op_threads = 6 > filestore_op_threads = 10 > filestore_max_sync_interval = 30 > > I was wondering if anyone has any advice as to anything else we can do > balancing client impact and speed of recovery or war stories of other things > to consider. > > I'm also wondering about the interplay between chooseleaf_vary_r and > chooseleaf_stable. > Are we better with > 1) sticking with choosleaf_vary_r = 4, setting chooseleaf_stable =1, > upgrading and then setting chooseleaf_vary_r incrementally to 1 when more > time is available > or > 2) setting chooseleaf_vary_r incrementally first, then chooseleaf_stable and > finally upgrade > > All this bearing in mind we'd like to keep the time it takes us to get to > luminous as short as possible ;-) (guestimating a 59% rebalance to take many > days) > > Any advice/thoughts gratefully received. > > Regards, > Adrian. > > -- > --- > Adrian : aussie...@gmail.com > If violence doesn't solve your problem, you're not using enough of it. > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com