Re: [ceph-users] best way to resolve 'stale+active+clean' after disk failure

2017-04-07 Thread David Welch
Thanks for the suggestions. There turned out to be an old testing pool 
with replication of 1 that was causing the issue. Removing the pool 
fixed the issue.



On 04/06/2017 07:34 PM, Brad Hubbard wrote:

What are size and min_size for pool '7'... and why?

On Fri, Apr 7, 2017 at 4:20 AM, David Welch <dwe...@thinkars.com> wrote:

Hi,
We had a disk on the cluster that was not responding properly and causing
'slow requests'. The osd on the disk was stopped and the osd was marked down
and then out. Rebalancing succeeded but (some?) pgs from that osd are now
stuck in stale+active+clean state, which is not being resolved (see below
for query results).

My question: is it better to mark this osd as "lost" (i.e. 'ceph osd lost
14') or to remove the osd as detailed here:
https://www.sebastien-han.fr/blog/2015/12/11/ceph-properly-remove-an-osd/

Thanks,
David


$ ceph health detail
HEALTH_ERR 17 pgs are stuck inactive for more than 300 seconds; 17 pgs
stale; 17 pgs stuck stale
pg 7.f3 is stuck stale for 6138.330316, current state stale+active+clean,
last acting [14]
pg 7.bd is stuck stale for 6138.330365, current state stale+active+clean,
last acting [14]
pg 7.b6 is stuck stale for 6138.330374, current state stale+active+clean,
last acting [14]
pg 7.c5 is stuck stale for 6138.330363, current state stale+active+clean,
last acting [14]
pg 7.ac is stuck stale for 6138.330385, current state stale+active+clean,
last acting [14]
pg 7.5b is stuck stale for 6138.330678, current state stale+active+clean,
last acting [14]
pg 7.1b4 is stuck stale for 6138.330409, current state stale+active+clean,
last acting [14]
pg 7.182 is stuck stale for 6138.330445, current state stale+active+clean,
last acting [14]
pg 7.1f8 is stuck stale for 6138.330720, current state stale+active+clean,
last acting [14]
pg 7.53 is stuck stale for 6138.330697, current state stale+active+clean,
last acting [14]
pg 7.1d2 is stuck stale for 6138.330663, current state stale+active+clean,
last acting [14]
pg 7.70 is stuck stale for 6138.330742, current state stale+active+clean,
last acting [14]
pg 7.14f is stuck stale for 6138.330585, current state stale+active+clean,
last acting [14]
pg 7.23 is stuck stale for 6138.330610, current state stale+active+clean,
last acting [14]
pg 7.153 is stuck stale for 6138.330600, current state stale+active+clean,
last acting [14]
pg 7.cc is stuck stale for 6138.330409, current state stale+active+clean,
last acting [14]
pg 7.16b is stuck stale for 6138.330509, current state stale+active+clean,
last acting [14]
$ ceph pg dump_stuck stale
ok
pg_statstateupup_primaryactingacting_primary
7.f3stale+active+clean[14]14[14]14
7.bdstale+active+clean[14]14[14]14
7.b6stale+active+clean[14]14[14]14
7.c5stale+active+clean[14]14[14]14
7.acstale+active+clean[14]14[14]14
7.5bstale+active+clean[14]14[14]14
7.1b4stale+active+clean[14]14[14]14
7.182stale+active+clean[14]14[14]14
7.1f8stale+active+clean[14]14[14]14
7.53stale+active+clean[14]14[14]14
7.1d2stale+active+clean[14]14[14]14
7.70stale+active+clean[14]14[14]14
7.14fstale+active+clean[14]14[14]14
7.23stale+active+clean[14]14[14]14
7.153stale+active+clean[14]14[14]14
7.ccstale+active+clean[14]14[14]14
7.16bstale+active+clean[14]14[14]14



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






--
~~
David Welch
DevOps
ARS
http://thinkars.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] best way to resolve 'stale+active+clean' after disk failure

2017-04-06 Thread David Welch

Hi,
We had a disk on the cluster that was not responding properly and 
causing 'slow requests'. The osd on the disk was stopped and the osd was 
marked down and then out. Rebalancing succeeded but (some?) pgs from 
that osd are now stuck in stale+active+clean state, which is not being 
resolved (see below for query results).


My question: is it better to mark this osd as "lost" (i.e. 'ceph osd 
lost 14') or to remove the osd as detailed here:

https://www.sebastien-han.fr/blog/2015/12/11/ceph-properly-remove-an-osd/

Thanks,
David

/
$ ceph health detail
HEALTH_ERR 17 pgs are stuck inactive for more than 300 seconds; 17 pgs 
stale; 17 pgs stuck stale
pg 7.f3 is stuck stale for 6138.330316, current state 
stale+active+clean, last acting [14]
pg 7.bd is stuck stale for 6138.330365, current state 
stale+active+clean, last acting [14]
pg 7.b6 is stuck stale for 6138.330374, current state 
stale+active+clean, last acting [14]
pg 7.c5 is stuck stale for 6138.330363, current state 
stale+active+clean, last acting [14]
pg 7.ac is stuck stale for 6138.330385, current state 
stale+active+clean, last acting [14]
pg 7.5b is stuck stale for 6138.330678, current state 
stale+active+clean, last acting [14]
pg 7.1b4 is stuck stale for 6138.330409, current state 
stale+active+clean, last acting [14]
pg 7.182 is stuck stale for 6138.330445, current state 
stale+active+clean, last acting [14]
pg 7.1f8 is stuck stale for 6138.330720, current state 
stale+active+clean, last acting [14]
pg 7.53 is stuck stale for 6138.330697, current state 
stale+active+clean, last acting [14]
pg 7.1d2 is stuck stale for 6138.330663, current state 
stale+active+clean, last acting [14]
pg 7.70 is stuck stale for 6138.330742, current state 
stale+active+clean, last acting [14]
pg 7.14f is stuck stale for 6138.330585, current state 
stale+active+clean, last acting [14]
pg 7.23 is stuck stale for 6138.330610, current state 
stale+active+clean, last acting [14]
pg 7.153 is stuck stale for 6138.330600, current state 
stale+active+clean, last acting [14]
pg 7.cc is stuck stale for 6138.330409, current state 
stale+active+clean, last acting [14]
pg 7.16b is stuck stale for 6138.330509, current state 
stale+active+clean, last acting [14]

$ ceph pg dump_stuck stale//
//ok//
//pg_statstateupup_primaryacting acting_primary//
//7.f3stale+active+clean[14]14[14]14//
//7.bdstale+active+clean[14]14[14]14//
//7.b6stale+active+clean[14]14[14]14//
//7.c5stale+active+clean[14]14[14]14//
//7.acstale+active+clean[14]14[14]14//
//7.5bstale+active+clean[14]14[14]14//
//7.1b4stale+active+clean[14]14[14]14//
//7.182stale+active+clean[14]14[14]14//
//7.1f8stale+active+clean[14]14[14]14//
//7.53stale+active+clean[14]14[14]14//
//7.1d2stale+active+clean[14]14[14]14//
//7.70stale+active+clean[14]14[14]14//
//7.14fstale+active+clean[14]14[14]14//
//7.23stale+active+clean[14]14[14]14//
//7.153stale+active+clean[14]14[14]14//
//7.ccstale+active+clean[14]14[14]14//
//7.16bstale+active+clean[14]14[14]14/


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] systemd and ceph-mon autostart on Ubuntu 16.04

2017-03-28 Thread David Welch
We also ran into this problem on upgrading Ubuntu from 14.04 to 16.04. 
The service file is not being automatically created. The issue was 
resolved with the following steps:


$ sudo systemctl enable ceph-mon@your-hostname
/Created symlink from 
/etc/systemd/system/ceph-mon.target.wants/ceph-mon@your-hostname.service 
to /lib/systemd/system/ceph-mon@.service./


$ sudo systemctl enable ceph-mon@your-hostname

$ sudo systemctl start ceph-mon@your-hostname

Now it should start and join the cluster.

-David



On 01/25/2017 02:35 PM, Wido den Hollander wrote:

Op 25 januari 2017 om 20:25 schreef Patrick Donnelly <pdonn...@redhat.com>:


On Wed, Jan 25, 2017 at 2:19 PM, Wido den Hollander <w...@42on.com> wrote:

Hi,

I thought this issue was resolved a while ago, but while testing Kraken with 
BlueStore I ran into the problem again.

My monitors are not being started on boot:

Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-59-generic x86_64)

  * Documentation:  https://help.ubuntu.com
  * Management: https://landscape.canonical.com
  * Support:https://ubuntu.com/advantage
Last login: Wed Jan 25 15:08:57 2017 from 2001:db8::100
root@bravo:~# systemctl status ceph-mon.target
● ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service 
instances at once
Loaded: loaded (/lib/systemd/system/ceph-mon.target; disabled; vendor 
preset: enabled)
Active: inactive (dead)
root@bravo:~#

If I enable ceph-mon.target my Monitors start just fine on boot:

root@bravo:~# systemctl enable ceph-mon.target
Created symlink from 
/etc/systemd/system/multi-user.target.wants/ceph-mon.target to 
/lib/systemd/system/ceph-mon.target.
Created symlink from /etc/systemd/system/ceph.target.wants/ceph-mon.target to 
/lib/systemd/system/ceph-mon.target.
root@bravo:~# ceph -v
ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)
root@bravo:~#

Anybody else seeing this before I start digging into the .deb packaging?

Are you wanting ceph-mon.target to automatically be enabled on package
install? That doesn't sound good to me but I'm not familiar with
Ubuntu's packaging rules. I would think the sysadmin must enable the
services they install themselves.


Under Ubuntu that usually happens yes. This system however was installed with 
ceph-deploy (1.5.37) OSDs started on boot, but the MONs didn't.

The OSDs were started by udev/ceph-disk however.

I checked my ceph-deploy log and I found:

[2017-01-23 18:56:56,370][alpha][INFO  ] Running command: systemctl enable 
ceph.target
[2017-01-23 18:56:56,394][alpha][WARNING] Created symlink from 
/etc/systemd/system/multi-user.target.wants/ceph.target to 
/lib/systemd/system/ceph.target.
[2017-01-23 18:56:56,487][alpha][INFO  ] Running command: systemctl enable 
ceph-mon@alpha
[2017-01-23 18:56:56,504][alpha][WARNING] Created symlink from 
/etc/systemd/system/ceph-mon.target.wants/ceph-mon@alpha.service to 
/lib/systemd/system/ceph-mon@.service.
[2017-01-23 18:56:56,656][alpha][INFO  ] Running command: systemctl start 
ceph-mon@alpha

It doesn't seem to enable ceph-mon.target thus not enabling the MON to start on 
boot.

This small cluster runs inside VirtualBox with the machines alpha, bravo and 
charlie.

Wido


--
Patrick Donnelly

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
~~
David Welch
DevOps
ARS
http://thinkars.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs ata1.00: status: { DRDY }

2017-01-05 Thread David Welch
Looks like disk i/o is too slow. You can try configuring ceph.conf with 
settings like "osd client op priority"


http://docs.ceph.com/docs/jewel/rados/configuration/osd-config-ref/
(which is not loading for me at the moment...)

On 01/05/2017 04:43 PM, Oliver Dzombic wrote:

Hi,

any idea of the root cause of this, inside a KVM VM, running qcow2 on
cephfs dmesg shows:

[846193.473396] ata1.00: status: { DRDY }
[846196.231058] ata1: soft resetting link
[846196.386714] ata1.01: NODEV after polling detection
[846196.391048] ata1.00: configured for MWDMA2
[846196.391053] ata1.00: retrying FLUSH 0xea Emask 0x4
[846196.391671] ata1: EH complete
[1019646.935659] UDP: bad checksum. From 122.224.153.109:46252 to
193.24.210.48:161 ulen 49
[1107679.421951] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action
0x6 frozen
[1107679.423407] ata1.00: failed command: FLUSH CACHE EXT
[1107679.424871] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
  res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
[1107679.427596] ata1.00: status: { DRDY }
[1107684.482035] ata1: link is slow to respond, please be patient (ready=0)
[1107689.480237] ata1: device not ready (errno=-16), forcing hardreset
[1107689.480267] ata1: soft resetting link
[1107689.637701] ata1.00: configured for MWDMA2
[1107689.637707] ata1.00: retrying FLUSH 0xea Emask 0x4
[1107704.638255] ata1.00: qc timeout (cmd 0xea)
[1107704.638282] ata1.00: FLUSH failed Emask 0x4
[1107709.687013] ata1: link is slow to respond, please be patient (ready=0)
[1107710.095069] ata1: soft resetting link
[1107710.246403] ata1.01: NODEV after polling detection
[1107710.247225] ata1.00: configured for MWDMA2
[1107710.247229] ata1.00: retrying FLUSH 0xea Emask 0x4
[1107710.248170] ata1: EH complete
[1199723.323256] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action
0x6 frozen
[1199723.324769] ata1.00: failed command: FLUSH CACHE EXT
[1199723.326734] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
  res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)


Hostmachine is running Kernel 4.5.4


Hostmachine dmesg:


[1235641.055673] INFO: task qemu-kvm:18287 blocked for more than 120
seconds.
[1235641.056066]   Not tainted 4.5.4ceph-vps-default #1
[1235641.056315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1235641.056583] qemu-kvmD 8812f939bb58 0 18287  1
0x0080
[1235641.056587]  8812f939bb58 881034c02b80 881b7044ab80
8812f939c000
[1235641.056590]   7fff 881c7ffd7b70
818c1d90
[1235641.056592]  8812f939bb70 818c1525 88103fa16d00
8812f939bc18
[1235641.056594] Call Trace:
[1235641.056603]  [] ? bit_wait+0x50/0x50
[1235641.056605]  [] schedule+0x35/0x80
[1235641.056609]  [] schedule_timeout+0x231/0x2d0
[1235641.056613]  [] ? ktime_get+0x3c/0xb0
[1235641.056622]  [] ? bit_wait+0x50/0x50
[1235641.056624]  [] io_schedule_timeout+0xa6/0x110
[1235641.056626]  [] bit_wait_io+0x1b/0x60
[1235641.056627]  [] __wait_on_bit+0x60/0x90
[1235641.056632]  [] wait_on_page_bit+0xcb/0xf0
[1235641.056636]  [] ? autoremove_wake_function+0x40/0x40
[1235641.056638]  [] __filemap_fdatawait_range+0xff/0x180
[1235641.056641]  [] ?
__filemap_fdatawrite_range+0xd1/0x100
[1235641.056644]  [] filemap_fdatawait_range+0x14/0x30
[1235641.056646]  []
filemap_write_and_wait_range+0x3f/0x70
[1235641.056649]  [] ceph_fsync+0x69/0x5c0
[1235641.056656]  [] ? do_futex+0xfd/0x530
[1235641.056663]  [] vfs_fsync_range+0x3d/0xb0
[1235641.056668]  [] ?
syscall_trace_enter_phase1+0x139/0x150
[1235641.056670]  [] do_fsync+0x3d/0x70
[1235641.056673]  [] SyS_fdatasync+0x13/0x20
[1235641.056676]  [] entry_SYSCALL_64_fastpath+0x12/0x71


This sometimes happens, on a healthy cluster, running

ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

OSD Servers running Kernel 4.5.5

Maybe it will cause the VM to refuse IO and has to be restarted. Maybe
not and it will continue.



Any input is appriciated. Thank you !




--
~~
David Welch
DevOps
ARS
http://thinkars.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] documentation: osd crash tunables optimal and "some data movement"

2016-12-08 Thread David Welch

I've seen this before and would recommend upgrading from Hammer.


On 12/08/2016 04:26 PM, Peter Gervai wrote:

"Hello List,

This could be transformed a bugreport if anyone feels like, I just
kind of share my harsh experience of today.

We had a few OSDs getting full while others being below 40%; while
these were properly weighted (full ones of 800GB being 0.800 and
fairly empty ones being 2.7TB and 2.700 weighted) it did not seem to
work well.

So I did a
   ceph osd reweight-by-utilization
which resulted some stuck unclean pgs (apart from ~10% pg migration).

Net wisdom said that the CRUSH map and the probabilities were the
cause (of some not really defined way, mentioning probabilities
rejecting OSDs, which without context were pretty hard to interpret
but I have accepted "crush problem, don't ask more"), and some
mentioned that CRUSH tunables should be set to optimal. I have tried
to see what 'optimal' would change but that's not trivial, there seem
to be no documentation to _see_ the current values [update: I figured
out that exporting the crushmap and decompiling it with crushtool
lists the current tunables at the top] or to know which preset contain
what values.

This is ceph version 0.94.9
(fe6d859066244b97b24f09d46552afc2071e6f90), aka hammer. It was
installed as hammer as far as I remember.

Now, the documentation says that if I set tunables to optimal, quote:
"this will result in some data movement (possibly as much as 10%)."
(Sidenote: ceph wasn't complaining about tunables.)

So, that's okay. "ceph osd crush tunables optimal"

Setting it resulted the not quite funny amount of 65% displaced
objects, which didn't make me happy, nor the cluster members due to
extreme IO loads. Fortunately setting it back to "legacy" caused the
whole shitstorm to stop. (I will start it soon, now, in the late
evening, where it won't cause too much harm.)

So, it's not always "some" and "possibly as much as 10%". Reading the
various tunable profiles it seems that there are changes with high
data migration so I don't quite see why this "small data movement" is
mentioned: it's possible, but not compulsory.

Peter
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
~~
David Welch
DevOps
ARS
http://thinkars.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy fails to copy keyring

2016-09-23 Thread David Welch

Hi,

I have problems with the ceph-deploy command failing for reasons which 
don't seem obvious. For instance, I was trying to add a monitor:


$ ceph-deploy mon add newmonhost
[Skipping some output]
[newmonhost][DEBUG ] checking for done path: 
/var/lib/ceph/mon/ceph-newmonhost/done

[newmonhost][DEBUG ] create a done file to avoid re-doing the mon deployment
[newmonhost][DEBUG ] create the init path if it does not exist
[newmonhost][INFO  ] Running command: sudo initctl emit ceph-mon 
cluster=ceph id=newmonhost
[newmonhost][INFO  ] Running command: sudo ceph --cluster=ceph 
--admin-daemon /var/run/ceph/ceph-mon.newmonhost.asok mon_status
[newmonhost][ERROR ] admin_socket: exception getting command 
descriptions: [Errno 2] No such file or directory

[newmonhost][WARNIN] monitor: mon.newmonhost, might not be running yet
[newmonhost][INFO  ] Running command: sudo ceph --cluster=ceph 
--admin-daemon /var/run/ceph/ceph-mon.newmonhost.asok mon_status
[newmonhost][ERROR ] admin_socket: exception getting command 
descriptions: [Errno 2] No such file or directory

[newmonhost][WARNIN] monitor newmonhost does not exist in monmap

I have the keyring path specified in ceph.conf (pasted below).

What is weird is it works fine if I first copy the keyring file to 
/var/lib/ceph/mon/ceph-newmonhost/keyring. Can anyone help me understand 
why this step is not automated? I have had similar issues adding OSD's 
before.


Some info (on the admin node):

$ ceph-deploy --version
1.5.35
$ uname -a; lsb_release -a
Linux cephadmin 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

No LSB modules are available.
Distributor ID:Ubuntu
Description:Ubuntu 14.04.5 LTS
Release:14.04
Codename:trusty
$ ceph --version
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

$ cat ceph.conf:
mon_initial_members = monhost1, monhost2, monhost3, newmonhost
mon_host = 172.16.xx.xx,172.16.xx.xxx,172.16.xx.xxx,172.16.xx.x
mon_pg_warn_max_per_osd = 0
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
public_network = 172.16.0.0/16
rbd_default_format = 2

[osd]
#osd_mkfs_type = btrfs
#osd_mkfs_options_btrfs = -f
#osd_mount_options_btrfs = rw,noatime

[mon]
keyring = /etc/ceph/ceph.mon.keyring

Thanks for any help!
Dave

--
~~
David Welch
DevOps
ARS
http://thinkars.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com