Re: [ceph-users] best way to resolve 'stale+active+clean' after disk failure
Thanks for the suggestions. There turned out to be an old testing pool with replication of 1 that was causing the issue. Removing the pool fixed the issue. On 04/06/2017 07:34 PM, Brad Hubbard wrote: What are size and min_size for pool '7'... and why? On Fri, Apr 7, 2017 at 4:20 AM, David Welch <dwe...@thinkars.com> wrote: Hi, We had a disk on the cluster that was not responding properly and causing 'slow requests'. The osd on the disk was stopped and the osd was marked down and then out. Rebalancing succeeded but (some?) pgs from that osd are now stuck in stale+active+clean state, which is not being resolved (see below for query results). My question: is it better to mark this osd as "lost" (i.e. 'ceph osd lost 14') or to remove the osd as detailed here: https://www.sebastien-han.fr/blog/2015/12/11/ceph-properly-remove-an-osd/ Thanks, David $ ceph health detail HEALTH_ERR 17 pgs are stuck inactive for more than 300 seconds; 17 pgs stale; 17 pgs stuck stale pg 7.f3 is stuck stale for 6138.330316, current state stale+active+clean, last acting [14] pg 7.bd is stuck stale for 6138.330365, current state stale+active+clean, last acting [14] pg 7.b6 is stuck stale for 6138.330374, current state stale+active+clean, last acting [14] pg 7.c5 is stuck stale for 6138.330363, current state stale+active+clean, last acting [14] pg 7.ac is stuck stale for 6138.330385, current state stale+active+clean, last acting [14] pg 7.5b is stuck stale for 6138.330678, current state stale+active+clean, last acting [14] pg 7.1b4 is stuck stale for 6138.330409, current state stale+active+clean, last acting [14] pg 7.182 is stuck stale for 6138.330445, current state stale+active+clean, last acting [14] pg 7.1f8 is stuck stale for 6138.330720, current state stale+active+clean, last acting [14] pg 7.53 is stuck stale for 6138.330697, current state stale+active+clean, last acting [14] pg 7.1d2 is stuck stale for 6138.330663, current state stale+active+clean, last acting [14] pg 7.70 is stuck stale for 6138.330742, current state stale+active+clean, last acting [14] pg 7.14f is stuck stale for 6138.330585, current state stale+active+clean, last acting [14] pg 7.23 is stuck stale for 6138.330610, current state stale+active+clean, last acting [14] pg 7.153 is stuck stale for 6138.330600, current state stale+active+clean, last acting [14] pg 7.cc is stuck stale for 6138.330409, current state stale+active+clean, last acting [14] pg 7.16b is stuck stale for 6138.330509, current state stale+active+clean, last acting [14] $ ceph pg dump_stuck stale ok pg_statstateupup_primaryactingacting_primary 7.f3stale+active+clean[14]14[14]14 7.bdstale+active+clean[14]14[14]14 7.b6stale+active+clean[14]14[14]14 7.c5stale+active+clean[14]14[14]14 7.acstale+active+clean[14]14[14]14 7.5bstale+active+clean[14]14[14]14 7.1b4stale+active+clean[14]14[14]14 7.182stale+active+clean[14]14[14]14 7.1f8stale+active+clean[14]14[14]14 7.53stale+active+clean[14]14[14]14 7.1d2stale+active+clean[14]14[14]14 7.70stale+active+clean[14]14[14]14 7.14fstale+active+clean[14]14[14]14 7.23stale+active+clean[14]14[14]14 7.153stale+active+clean[14]14[14]14 7.ccstale+active+clean[14]14[14]14 7.16bstale+active+clean[14]14[14]14 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- ~~ David Welch DevOps ARS http://thinkars.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] best way to resolve 'stale+active+clean' after disk failure
Hi, We had a disk on the cluster that was not responding properly and causing 'slow requests'. The osd on the disk was stopped and the osd was marked down and then out. Rebalancing succeeded but (some?) pgs from that osd are now stuck in stale+active+clean state, which is not being resolved (see below for query results). My question: is it better to mark this osd as "lost" (i.e. 'ceph osd lost 14') or to remove the osd as detailed here: https://www.sebastien-han.fr/blog/2015/12/11/ceph-properly-remove-an-osd/ Thanks, David / $ ceph health detail HEALTH_ERR 17 pgs are stuck inactive for more than 300 seconds; 17 pgs stale; 17 pgs stuck stale pg 7.f3 is stuck stale for 6138.330316, current state stale+active+clean, last acting [14] pg 7.bd is stuck stale for 6138.330365, current state stale+active+clean, last acting [14] pg 7.b6 is stuck stale for 6138.330374, current state stale+active+clean, last acting [14] pg 7.c5 is stuck stale for 6138.330363, current state stale+active+clean, last acting [14] pg 7.ac is stuck stale for 6138.330385, current state stale+active+clean, last acting [14] pg 7.5b is stuck stale for 6138.330678, current state stale+active+clean, last acting [14] pg 7.1b4 is stuck stale for 6138.330409, current state stale+active+clean, last acting [14] pg 7.182 is stuck stale for 6138.330445, current state stale+active+clean, last acting [14] pg 7.1f8 is stuck stale for 6138.330720, current state stale+active+clean, last acting [14] pg 7.53 is stuck stale for 6138.330697, current state stale+active+clean, last acting [14] pg 7.1d2 is stuck stale for 6138.330663, current state stale+active+clean, last acting [14] pg 7.70 is stuck stale for 6138.330742, current state stale+active+clean, last acting [14] pg 7.14f is stuck stale for 6138.330585, current state stale+active+clean, last acting [14] pg 7.23 is stuck stale for 6138.330610, current state stale+active+clean, last acting [14] pg 7.153 is stuck stale for 6138.330600, current state stale+active+clean, last acting [14] pg 7.cc is stuck stale for 6138.330409, current state stale+active+clean, last acting [14] pg 7.16b is stuck stale for 6138.330509, current state stale+active+clean, last acting [14] $ ceph pg dump_stuck stale// //ok// //pg_statstateupup_primaryacting acting_primary// //7.f3stale+active+clean[14]14[14]14// //7.bdstale+active+clean[14]14[14]14// //7.b6stale+active+clean[14]14[14]14// //7.c5stale+active+clean[14]14[14]14// //7.acstale+active+clean[14]14[14]14// //7.5bstale+active+clean[14]14[14]14// //7.1b4stale+active+clean[14]14[14]14// //7.182stale+active+clean[14]14[14]14// //7.1f8stale+active+clean[14]14[14]14// //7.53stale+active+clean[14]14[14]14// //7.1d2stale+active+clean[14]14[14]14// //7.70stale+active+clean[14]14[14]14// //7.14fstale+active+clean[14]14[14]14// //7.23stale+active+clean[14]14[14]14// //7.153stale+active+clean[14]14[14]14// //7.ccstale+active+clean[14]14[14]14// //7.16bstale+active+clean[14]14[14]14/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] systemd and ceph-mon autostart on Ubuntu 16.04
We also ran into this problem on upgrading Ubuntu from 14.04 to 16.04. The service file is not being automatically created. The issue was resolved with the following steps: $ sudo systemctl enable ceph-mon@your-hostname /Created symlink from /etc/systemd/system/ceph-mon.target.wants/ceph-mon@your-hostname.service to /lib/systemd/system/ceph-mon@.service./ $ sudo systemctl enable ceph-mon@your-hostname $ sudo systemctl start ceph-mon@your-hostname Now it should start and join the cluster. -David On 01/25/2017 02:35 PM, Wido den Hollander wrote: Op 25 januari 2017 om 20:25 schreef Patrick Donnelly <pdonn...@redhat.com>: On Wed, Jan 25, 2017 at 2:19 PM, Wido den Hollander <w...@42on.com> wrote: Hi, I thought this issue was resolved a while ago, but while testing Kraken with BlueStore I ran into the problem again. My monitors are not being started on boot: Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-59-generic x86_64) * Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support:https://ubuntu.com/advantage Last login: Wed Jan 25 15:08:57 2017 from 2001:db8::100 root@bravo:~# systemctl status ceph-mon.target ● ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once Loaded: loaded (/lib/systemd/system/ceph-mon.target; disabled; vendor preset: enabled) Active: inactive (dead) root@bravo:~# If I enable ceph-mon.target my Monitors start just fine on boot: root@bravo:~# systemctl enable ceph-mon.target Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-mon.target to /lib/systemd/system/ceph-mon.target. Created symlink from /etc/systemd/system/ceph.target.wants/ceph-mon.target to /lib/systemd/system/ceph-mon.target. root@bravo:~# ceph -v ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7) root@bravo:~# Anybody else seeing this before I start digging into the .deb packaging? Are you wanting ceph-mon.target to automatically be enabled on package install? That doesn't sound good to me but I'm not familiar with Ubuntu's packaging rules. I would think the sysadmin must enable the services they install themselves. Under Ubuntu that usually happens yes. This system however was installed with ceph-deploy (1.5.37) OSDs started on boot, but the MONs didn't. The OSDs were started by udev/ceph-disk however. I checked my ceph-deploy log and I found: [2017-01-23 18:56:56,370][alpha][INFO ] Running command: systemctl enable ceph.target [2017-01-23 18:56:56,394][alpha][WARNING] Created symlink from /etc/systemd/system/multi-user.target.wants/ceph.target to /lib/systemd/system/ceph.target. [2017-01-23 18:56:56,487][alpha][INFO ] Running command: systemctl enable ceph-mon@alpha [2017-01-23 18:56:56,504][alpha][WARNING] Created symlink from /etc/systemd/system/ceph-mon.target.wants/ceph-mon@alpha.service to /lib/systemd/system/ceph-mon@.service. [2017-01-23 18:56:56,656][alpha][INFO ] Running command: systemctl start ceph-mon@alpha It doesn't seem to enable ceph-mon.target thus not enabling the MON to start on boot. This small cluster runs inside VirtualBox with the machines alpha, bravo and charlie. Wido -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- ~~ David Welch DevOps ARS http://thinkars.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs ata1.00: status: { DRDY }
Looks like disk i/o is too slow. You can try configuring ceph.conf with settings like "osd client op priority" http://docs.ceph.com/docs/jewel/rados/configuration/osd-config-ref/ (which is not loading for me at the moment...) On 01/05/2017 04:43 PM, Oliver Dzombic wrote: Hi, any idea of the root cause of this, inside a KVM VM, running qcow2 on cephfs dmesg shows: [846193.473396] ata1.00: status: { DRDY } [846196.231058] ata1: soft resetting link [846196.386714] ata1.01: NODEV after polling detection [846196.391048] ata1.00: configured for MWDMA2 [846196.391053] ata1.00: retrying FLUSH 0xea Emask 0x4 [846196.391671] ata1: EH complete [1019646.935659] UDP: bad checksum. From 122.224.153.109:46252 to 193.24.210.48:161 ulen 49 [1107679.421951] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [1107679.423407] ata1.00: failed command: FLUSH CACHE EXT [1107679.424871] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout) [1107679.427596] ata1.00: status: { DRDY } [1107684.482035] ata1: link is slow to respond, please be patient (ready=0) [1107689.480237] ata1: device not ready (errno=-16), forcing hardreset [1107689.480267] ata1: soft resetting link [1107689.637701] ata1.00: configured for MWDMA2 [1107689.637707] ata1.00: retrying FLUSH 0xea Emask 0x4 [1107704.638255] ata1.00: qc timeout (cmd 0xea) [1107704.638282] ata1.00: FLUSH failed Emask 0x4 [1107709.687013] ata1: link is slow to respond, please be patient (ready=0) [1107710.095069] ata1: soft resetting link [1107710.246403] ata1.01: NODEV after polling detection [1107710.247225] ata1.00: configured for MWDMA2 [1107710.247229] ata1.00: retrying FLUSH 0xea Emask 0x4 [1107710.248170] ata1: EH complete [1199723.323256] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [1199723.324769] ata1.00: failed command: FLUSH CACHE EXT [1199723.326734] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout) Hostmachine is running Kernel 4.5.4 Hostmachine dmesg: [1235641.055673] INFO: task qemu-kvm:18287 blocked for more than 120 seconds. [1235641.056066] Not tainted 4.5.4ceph-vps-default #1 [1235641.056315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1235641.056583] qemu-kvmD 8812f939bb58 0 18287 1 0x0080 [1235641.056587] 8812f939bb58 881034c02b80 881b7044ab80 8812f939c000 [1235641.056590] 7fff 881c7ffd7b70 818c1d90 [1235641.056592] 8812f939bb70 818c1525 88103fa16d00 8812f939bc18 [1235641.056594] Call Trace: [1235641.056603] [] ? bit_wait+0x50/0x50 [1235641.056605] [] schedule+0x35/0x80 [1235641.056609] [] schedule_timeout+0x231/0x2d0 [1235641.056613] [] ? ktime_get+0x3c/0xb0 [1235641.056622] [] ? bit_wait+0x50/0x50 [1235641.056624] [] io_schedule_timeout+0xa6/0x110 [1235641.056626] [] bit_wait_io+0x1b/0x60 [1235641.056627] [] __wait_on_bit+0x60/0x90 [1235641.056632] [] wait_on_page_bit+0xcb/0xf0 [1235641.056636] [] ? autoremove_wake_function+0x40/0x40 [1235641.056638] [] __filemap_fdatawait_range+0xff/0x180 [1235641.056641] [] ? __filemap_fdatawrite_range+0xd1/0x100 [1235641.056644] [] filemap_fdatawait_range+0x14/0x30 [1235641.056646] [] filemap_write_and_wait_range+0x3f/0x70 [1235641.056649] [] ceph_fsync+0x69/0x5c0 [1235641.056656] [] ? do_futex+0xfd/0x530 [1235641.056663] [] vfs_fsync_range+0x3d/0xb0 [1235641.056668] [] ? syscall_trace_enter_phase1+0x139/0x150 [1235641.056670] [] do_fsync+0x3d/0x70 [1235641.056673] [] SyS_fdatasync+0x13/0x20 [1235641.056676] [] entry_SYSCALL_64_fastpath+0x12/0x71 This sometimes happens, on a healthy cluster, running ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) OSD Servers running Kernel 4.5.5 Maybe it will cause the VM to refuse IO and has to be restarted. Maybe not and it will continue. Any input is appriciated. Thank you ! -- ~~ David Welch DevOps ARS http://thinkars.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] documentation: osd crash tunables optimal and "some data movement"
I've seen this before and would recommend upgrading from Hammer. On 12/08/2016 04:26 PM, Peter Gervai wrote: "Hello List, This could be transformed a bugreport if anyone feels like, I just kind of share my harsh experience of today. We had a few OSDs getting full while others being below 40%; while these were properly weighted (full ones of 800GB being 0.800 and fairly empty ones being 2.7TB and 2.700 weighted) it did not seem to work well. So I did a ceph osd reweight-by-utilization which resulted some stuck unclean pgs (apart from ~10% pg migration). Net wisdom said that the CRUSH map and the probabilities were the cause (of some not really defined way, mentioning probabilities rejecting OSDs, which without context were pretty hard to interpret but I have accepted "crush problem, don't ask more"), and some mentioned that CRUSH tunables should be set to optimal. I have tried to see what 'optimal' would change but that's not trivial, there seem to be no documentation to _see_ the current values [update: I figured out that exporting the crushmap and decompiling it with crushtool lists the current tunables at the top] or to know which preset contain what values. This is ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90), aka hammer. It was installed as hammer as far as I remember. Now, the documentation says that if I set tunables to optimal, quote: "this will result in some data movement (possibly as much as 10%)." (Sidenote: ceph wasn't complaining about tunables.) So, that's okay. "ceph osd crush tunables optimal" Setting it resulted the not quite funny amount of 65% displaced objects, which didn't make me happy, nor the cluster members due to extreme IO loads. Fortunately setting it back to "legacy" caused the whole shitstorm to stop. (I will start it soon, now, in the late evening, where it won't cause too much harm.) So, it's not always "some" and "possibly as much as 10%". Reading the various tunable profiles it seems that there are changes with high data migration so I don't quite see why this "small data movement" is mentioned: it's possible, but not compulsory. Peter ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- ~~ David Welch DevOps ARS http://thinkars.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-deploy fails to copy keyring
Hi, I have problems with the ceph-deploy command failing for reasons which don't seem obvious. For instance, I was trying to add a monitor: $ ceph-deploy mon add newmonhost [Skipping some output] [newmonhost][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-newmonhost/done [newmonhost][DEBUG ] create a done file to avoid re-doing the mon deployment [newmonhost][DEBUG ] create the init path if it does not exist [newmonhost][INFO ] Running command: sudo initctl emit ceph-mon cluster=ceph id=newmonhost [newmonhost][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.newmonhost.asok mon_status [newmonhost][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory [newmonhost][WARNIN] monitor: mon.newmonhost, might not be running yet [newmonhost][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.newmonhost.asok mon_status [newmonhost][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory [newmonhost][WARNIN] monitor newmonhost does not exist in monmap I have the keyring path specified in ceph.conf (pasted below). What is weird is it works fine if I first copy the keyring file to /var/lib/ceph/mon/ceph-newmonhost/keyring. Can anyone help me understand why this step is not automated? I have had similar issues adding OSD's before. Some info (on the admin node): $ ceph-deploy --version 1.5.35 $ uname -a; lsb_release -a Linux cephadmin 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux No LSB modules are available. Distributor ID:Ubuntu Description:Ubuntu 14.04.5 LTS Release:14.04 Codename:trusty $ ceph --version ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) $ cat ceph.conf: mon_initial_members = monhost1, monhost2, monhost3, newmonhost mon_host = 172.16.xx.xx,172.16.xx.xxx,172.16.xx.xxx,172.16.xx.x mon_pg_warn_max_per_osd = 0 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true public_network = 172.16.0.0/16 rbd_default_format = 2 [osd] #osd_mkfs_type = btrfs #osd_mkfs_options_btrfs = -f #osd_mount_options_btrfs = rw,noatime [mon] keyring = /etc/ceph/ceph.mon.keyring Thanks for any help! Dave -- ~~ David Welch DevOps ARS http://thinkars.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com