Re: [ceph-users] Constant Compaction on one mimic node
I am getting a huge number of messages on one out of three nodes showing Manual compaction starting all the time. I see no such of log entries on the other nodes in the cluster. Mar 16 06:40:11 storage1n1-chi docker[24502]: debug 2019-03-16 06:40:11.441 7f6967af4700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:1024] [default] Manual compaction starting Mar 16 06:40:11 storage1n1-chi docker[24502]: message repeated 4 times: [ debug 2019-03-16 06:40:11.441 7f6967af4700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:1024] [default] Manual compaction starting] Mar 16 06:42:21 storage1n1-chi docker[24502]: debug 2019-03-16 06:42:21.466 7f6970305700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:77] [JOB 1021] Syncing log #194307 I am not sure what triggers those message on one node an not on the others. Checking config on all mons debug_leveldb 4/5 override debug_memdb 4/5 override debug_mgr 0/5 override debug_mgrc0/5 override debug_rocksdb 4/5 override Documentation tells nothing about the compaction logs or at least I couldn't find anything specific to my issue. You should look to docker side I think, because this is manual compaction, like `ceph daemon osd.0 compact` from admin socket or `ceph tell osd.0 compact` from admin cli. k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rebuild after upgrade
On 18/03/2019 13:24, Brent Kennedy wrote: I finally received approval to upgrade our old firefly(0.8.7) cluster to Luminous. I started the upgrade, upgrading to hammer(0.94.10), then jewel(10.2.11), but after jewel, I ran the “ceph osd crush tunables optimal” command, then “ceph –s” command showed 60% of the objects were misplaced. Now the cluster is just churning while it does the recovery for that. Is this something that happens when upgrading from firefly up? I had done a hammer upgrade to Jewel before, no rebalance occurred after issuing that command. Any time you change the CRUSH tunables, you can expect data movement. The exact impact can vary from nothing (if no changes were made or the changes don't impact your actual pools/CRUSH rules) to a lot of data movement. This is documented here: http://docs.ceph.com/docs/master/rados/operations/crush-map/ In particular, you turned on CRUSH_TUNALBLES5, which causes a large amount of data movement: http://docs.ceph.com/docs/master/rados/operations/crush-map/#jewel-crush-tunables5 Going from Firefly to Hammer has a much smaller impact (see the CRUSH_V4 section). -- Hector Martin (hec...@marcansoft.com) Public Key: https://mrcn.st/pub ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Rebuild after upgrade
I finally received approval to upgrade our old firefly(0.8.7) cluster to Luminous. I started the upgrade, upgrading to hammer(0.94.10), then jewel(10.2.11), but after jewel, I ran the "ceph osd crush tunables optimal" command, then "ceph -s" command showed 60% of the objects were misplaced. Now the cluster is just churning while it does the recovery for that. Is this something that happens when upgrading from firefly up? I had done a hammer upgrade to Jewel before, no rebalance occurred after issuing that command. Regards, -Brent Existing Clusters: Test: Luminous 12.2.10 with 3 osd servers, 1 mon/man, 1 gateway ( all virtual ) US Production: Jewel 10.2.11 with 5 osd servers, 3 mons, 3 gateways behind haproxy LB UK Production: Luminous 12.2.10 with 15 osd servers, 3 mons/man, 3 gateways behind haproxy LB US Production all SSD: Luminous 12.2.10 with 6 osd servers, 3 mons/man, 3 gateways behind haproxy LB ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Bluestore] Some of my osd's uses BlueFS slow storage for db - why?
Yes, I was in a similar situation initially where I had deployed my OSD's with 25GB DB partitions and after 3GB DB used, everything else was going into slowDB on disk. From memory 29GB was just enough to make the DB fit on flash, but 30GB is a safe round figure to aim for. With a 30GB DB partition with most RBD type workloads all data should reside on flash even for fairly large disks running erasure coding. Nick Nick, thank you! After upgrading to 12.2.11 I was expand blockDB and for a week after compaction slowDB is not used [1]. { "gift_bytes": 0, "reclaim_bytes": 0, "db_total_bytes": 32212897792, "db_used_bytes": 6572474368, "wal_total_bytes": 1074589696, "wal_used_bytes": 528482304, "slow_total_bytes": 240043163648, "slow_used_bytes": 0, "num_files": 113, "log_bytes": 8683520, "log_compactions": 3, "logged_bytes": 203821056, "files_written_wal": 2, "files_written_sst": 1138, "bytes_written_wal": 121626085396, "bytes_written_sst": 47053353874 } I also writed how-to increase partition size for my case, will maybe useful for someone [2]. [1] https://ibb.co/tXGqbbt [2] https://bit.ly/2UFVO9Z ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to lower log verbosity
I am not sure if it is any help but this is getting you some debug settings ceph daemon osd.0 config show| grep debug | grep "[0-9]/[0-9]" And eg. with such a loop you can set them to 0/0 logarr[debug_compressor]="1/5" logarr[debug_bluestore]="1/5" logarr[debug_bluefs]="1/5" logarr[debug_bdev]="1/3" logarr[debug_kstore]="1/5" logarr[debug_rocksdb]="4/5" logarr[debug_leveldb]="4/5" logarr[debug_memdb]="4/5" logarr[debug_kinetic]="1/5" logarr[debug_fuse]="1/5" logarr[debug_mgr]="1/5" logarr[debug_mgrc]="1/5" logarr[debug_dpdk]="1/5" logarr[debug_eventtrace]="1/5" for k in "${!logarr[@]}" do ceph tell osd.* injectargs --$k=0/0 done -Original Message- From: Alex Litvak [mailto:alexander.v.lit...@gmail.com] Sent: 17 March 2019 16:16 To: ceph-users@lists.ceph.com Cc: ceph-de...@vger.kernel.org Subject: [ceph-users] How to lower log verbosity Hello everyone, As I am troubleshooting an issue I see logs literally littered with messages such as below. I searched documentation and couldn't find a specific debug nob to turn. I see some debugging is on by default but I don't need to see staff below especially mgr and client repeating. Also I am not sure what the difference with memory vs file debug ratios. Thanks in advance for any hint. Mar 17 08:11:44 storage1n1-chi docker[13518]: audit 2019-03-17 08:11:43.045636 mgr.storage1n2-chi mgr.394322 10.1.40.62:0/19661 3377 : audit [DBG] from='client.320702 10.1.40.62:0/3859627195' entity='client.admin' cmd=[{"width": 80, "prefix": "iostat", "poll": true, "target": ["mgr", ""], "print_header": false}]: dispatch Mar 17 08:11:44 storage1n1-chi docker[13518]: audit 2019-03-17 08:11:43.048607 mgr.storage1n2-chi mgr.394322 10.1.40.62:0/19661 3378 : audit [DBG] from='client.328541 10.1.40.62:0/748275798' entity='client.admin' cmd=[{"width": 80, "prefix": "iostat", "poll": true, "target": ["mgr", ""], "print_header": false}]: dispatch Mar 17 08:11:44 storage1n1-chi docker[13518]: audit 2019-03-17 08:11:43.921762 mgr.storage1n2-chi mgr.394322 10.1.40.62:0/19661 3379 : audit [DBG] from='client.341603 10.1.40.62:0/535963786' entity='client.admin' cmd=[{"width": 80, "prefix": "iostat", "poll": true, "target": ["mgr", ""], "print_header": false}]: dispatch Mar 17 08:11:25 storage1n1-chi bf695c816174[17083]: 2019-03-17 08:11:25.845 7fe9ddb4f700 1 mgr send_beacon standby Mar 17 08:11:25 storage1n1-chi docker[13941]: 2019-03-17 08:11:25.845 7fe9ddb4f700 1 mgr send_beacon standby Mar 17 08:11:27 storage1n1-chi bf695c816174[17083]: 2019-03-17 08:11:27.865 7fe9ddb4f700 1 mgr send_beacon standby Mar 17 08:11:27 storage1n1-chi docker[13941]: 2019-03-17 08:11:27.865 7fe9ddb4f700 1 mgr send_beacon standby Mar 17 08:11:29 storage1n1-chi bf695c816174[17083]: 2019-03-17 08:11:29.885 7fe9ddb4f700 1 mgr send_beacon standby Mar 17 08:11:29 storage1n1-chi docker[13941]: 2019-03-17 08:11:29.885 7fe9ddb4f700 1 mgr send_beacon standby Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27d16700 1 mgr.server reply handle_command (0) Success | 299 KiB/s | 122 KiB/s | 421 KiB/s |306 | 17 |324 | Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 0 log_channel(audit) log [DBG] : from='client.320702 10.1.40.62:0/3859627195' entity='client.admin' cmd=[{"width": 80, "prefix": "iostat", "poll": true, "target": ["mgr", ""], "print_header": false}]: dispatch Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer status' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer mode' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer on' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer off' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer eval' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer eval-verbose' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer optimize' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer show' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer rm' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer reset' Mar 17 08:13:05 storage1n2-chi
[ceph-users] Cephfs error
2019-03-17 21:59:58.296394 7f97cbbe6700 0 -- 192.168.10.203:6800/1614422834 >> 192.168.10.43:0/1827964483 conn(0x55ba9614d000 :6800 s=STATE_OPEN pgs=8 cs=1 l=0).fault server, going to standby What does this mean? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Constant Compaction on one mimic node
I did some additional cleanup and restarted mon on all nodes. Manual compaction is now shown on all nodes. Is this normal operating mode? As it seems to be sporadic, could it have an effect on performance i.e. cause slow ops. Is there a way to limit it and is there a document that explains those things. Thank you again, On 3/17/2019 4:11 AM, Alex Litvak wrote: Hello everyone, I am getting a huge number of messages on one out of three nodes showing Manual compaction starting all the time. I see no such of log entries on the other nodes in the cluster. Mar 16 06:40:11 storage1n1-chi docker[24502]: debug 2019-03-16 06:40:11.441 7f6967af4700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:1024] [default] Manual compaction starting Mar 16 06:40:11 storage1n1-chi docker[24502]: message repeated 4 times: [ debug 2019-03-16 06:40:11.441 7f6967af4700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:1024] [default] Manual compaction starting] Mar 16 06:42:21 storage1n1-chi docker[24502]: debug 2019-03-16 06:42:21.466 7f6970305700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:77] [JOB 1021] Syncing log #194307 I am not sure what triggers those message on one node an not on the others. Checking config on all mons debug_leveldb 4/5 override debug_memdb 4/5 override debug_mgr 0/5 override debug_mgrc 0/5 override debug_rocksdb 4/5 override Documentation tells nothing about the compaction logs or at least I couldn't find anything specific to my issue. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to lower log verbosity
Hello everyone, As I am troubleshooting an issue I see logs literally littered with messages such as below. I searched documentation and couldn't find a specific debug nob to turn. I see some debugging is on by default but I don't need to see staff below especially mgr and client repeating. Also I am not sure what the difference with memory vs file debug ratios. Thanks in advance for any hint. Mar 17 08:11:44 storage1n1-chi docker[13518]: audit 2019-03-17 08:11:43.045636 mgr.storage1n2-chi mgr.394322 10.1.40.62:0/19661 3377 : audit [DBG] from='client.320702 10.1.40.62:0/3859627195' entity='client.admin' cmd=[{"width": 80, "prefix": "iostat", "poll": true, "target": ["mgr", ""], "print_header": false}]: dispatch Mar 17 08:11:44 storage1n1-chi docker[13518]: audit 2019-03-17 08:11:43.048607 mgr.storage1n2-chi mgr.394322 10.1.40.62:0/19661 3378 : audit [DBG] from='client.328541 10.1.40.62:0/748275798' entity='client.admin' cmd=[{"width": 80, "prefix": "iostat", "poll": true, "target": ["mgr", ""], "print_header": false}]: dispatch Mar 17 08:11:44 storage1n1-chi docker[13518]: audit 2019-03-17 08:11:43.921762 mgr.storage1n2-chi mgr.394322 10.1.40.62:0/19661 3379 : audit [DBG] from='client.341603 10.1.40.62:0/535963786' entity='client.admin' cmd=[{"width": 80, "prefix": "iostat", "poll": true, "target": ["mgr", ""], "print_header": false}]: dispatch Mar 17 08:11:25 storage1n1-chi bf695c816174[17083]: 2019-03-17 08:11:25.845 7fe9ddb4f700 1 mgr send_beacon standby Mar 17 08:11:25 storage1n1-chi docker[13941]: 2019-03-17 08:11:25.845 7fe9ddb4f700 1 mgr send_beacon standby Mar 17 08:11:27 storage1n1-chi bf695c816174[17083]: 2019-03-17 08:11:27.865 7fe9ddb4f700 1 mgr send_beacon standby Mar 17 08:11:27 storage1n1-chi docker[13941]: 2019-03-17 08:11:27.865 7fe9ddb4f700 1 mgr send_beacon standby Mar 17 08:11:29 storage1n1-chi bf695c816174[17083]: 2019-03-17 08:11:29.885 7fe9ddb4f700 1 mgr send_beacon standby Mar 17 08:11:29 storage1n1-chi docker[13941]: 2019-03-17 08:11:29.885 7fe9ddb4f700 1 mgr send_beacon standby Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27d16700 1 mgr.server reply handle_command (0) Success | 299 KiB/s | 122 KiB/s | 421 KiB/s |306 | 17 |324 | Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 0 log_channel(audit) log [DBG] : from='client.320702 10.1.40.62:0/3859627195' entity='client.admin' cmd=[{"width": 80, "prefix": "iostat", "poll": true, "target": ["mgr", ""], "print_header": false}]: dispatch Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer status' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer mode' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer on' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer off' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer eval' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer eval-verbose' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer optimize' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer show' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer rm' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer reset' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer dump' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'balancer execute' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'dashboard set-login-credentials' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'dashboard set-session-expire' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix: 'dashboard create-self-signed-cert' Mar 17 08:13:05 storage1n2-chi ecbe67c5b2e7[36572]: 2019-03-17 08:13:05.302 7f9c27515700 1 mgr.server handle_command pyc_prefix:
[ceph-users] Constant Compaction on one mimic node
Hello everyone, I am getting a huge number of messages on one out of three nodes showing Manual compaction starting all the time. I see no such of log entries on the other nodes in the cluster. Mar 16 06:40:11 storage1n1-chi docker[24502]: debug 2019-03-16 06:40:11.441 7f6967af4700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:1024] [default] Manual compaction starting Mar 16 06:40:11 storage1n1-chi docker[24502]: message repeated 4 times: [ debug 2019-03-16 06:40:11.441 7f6967af4700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:1024] [default] Manual compaction starting] Mar 16 06:42:21 storage1n1-chi docker[24502]: debug 2019-03-16 06:42:21.466 7f6970305700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:77] [JOB 1021] Syncing log #194307 I am not sure what triggers those message on one node an not on the others. Checking config on all mons debug_leveldb 4/5 override debug_memdb 4/5 override debug_mgr 0/5 override debug_mgrc0/5 override debug_rocksdb 4/5 override Documentation tells nothing about the compaction logs or at least I couldn't find anything specific to my issue. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com