Re: [ceph-users] How to speed up backfill

2018-01-10 Thread Irek Fasikhov
ceph tell osd.* injectargs '--osd_recovery_delay_start 30'

2018-01-11 10:31 GMT+03:00 shadow_lin :

> Hi ,
>  Mine is purely backfilling(remove a osd from the cluster) and it
> started at 600Mb/s and ended at about 3MB/s.
> How is your recovery made up?Is it backfill or log replay pg recovery
> or both?
>
> 2018-01-11
> --
> shadow_lin
> --
>
> *发件人:*Josef Zelenka 
> *发送时间:*2018-01-11 15:26
> *主题:*Re: [ceph-users] How to speed up backfill
> *收件人:*"shadow_lin"
> *抄送:*"ceph-users"
>
>
> Hi, our recovery slowed down significantly towards the end, however it was
> still about five times faster than the original speed.We suspected that
> this is caused somehow by threading (more objects transferred - more
> threads used), but this is only an assumption.
>
> On 11/01/18 05:02, shadow_lin wrote:
>
> Hi,
> I had tried these two method and for backfilling it seems only
> osd-max-backfills works.
> How was your recovery speed when it comes to the last few pgs or objects?
>
> 2018-01-11
> --
> shadow_lin
> --
>
> *发件人:*Josef Zelenka 
> 
> *发送时间:*2018-01-11 04:53
> *主题:*Re: [ceph-users] How to speed up backfill
> *收件人:*"shadow_lin" 
> *抄送:*
>
>
> Hi, i had the same issue a few days back, i tried playing around with
> these two:
>
> ceph tell 'osd.*' injectargs '--osd-max-backfills '
> ceph tell 'osd.*' injectargs '--osd-recovery-max-active  '
>  and it helped greatly(increased our recovery speed 20x), but be careful to 
> not overload your systems.
>
>
> On 10/01/18 17:50, shadow_lin wrote:
>
> Hi all,
> I am playing with setting for backfill to try to find how to control the
> speed of backfill.
>
> Now I only find  "osd max backfills" can have effect the backfill speed.
> But after all pg need to be backfilled begin backfilling I can't find any
> way to speed up backfills.
>
> Especailly when it comes to the last pg to recover, the speed is only a
> few MB/s(when there are multi pg are backfilled the speed could be more
> than 600MB/s in my test)
>
> I am a little confused about the setting of backfills and recovery.Though
> backfilling is a kind of recovery but It seems recovery setting is only
> about to replay pg logs to do recover  pg.
>
> Would change "osd recovery max active" or other recovery setting have any
> effect on backfilling?
>
> I did tried "osd recovery op priority" and "osd recovery max active" with
> no luck.
>
> Any advice would be greatly appreciated.Thanks
>
> 2018-01-11
> --
> lin.yunfan
>
>
> ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to speed up backfill

2018-01-10 Thread shadow_lin
Hi ,
 Mine is purely backfilling(remove a osd from the cluster) and it started 
at 600Mb/s and ended at about 3MB/s.
How is your recovery made up?Is it backfill or log replay pg recovery or 
both?

2018-01-11 

shadow_lin 



发件人:Josef Zelenka 
发送时间:2018-01-11 15:26
主题:Re: [ceph-users] How to speed up backfill
收件人:"shadow_lin"
抄送:"ceph-users"

Hi, our recovery slowed down significantly towards the end, however it was 
still about five times faster than the original speed.We suspected that this is 
caused somehow by threading (more objects transferred - more threads used), but 
this is only an assumption. 



On 11/01/18 05:02, shadow_lin wrote:

Hi,
I had tried these two method and for backfilling it seems only 
osd-max-backfills works.
How was your recovery speed when it comes to the last few pgs or objects?

2018-01-11 

shadow_lin 



发件人:Josef Zelenka 
发送时间:2018-01-11 04:53
主题:Re: [ceph-users] How to speed up backfill
收件人:"shadow_lin"
抄送:

Hi, i had the same issue a few days back, i tried playing around with these two:
ceph tell 'osd.*' injectargs '--osd-max-backfills '
ceph tell 'osd.*' injectargs '--osd-recovery-max-active  '
 and it helped greatly(increased our recovery speed 20x), but be careful to not 
overload your systems. 



On 10/01/18 17:50, shadow_lin wrote:

Hi all,
I am playing with setting for backfill to try to find how to control the speed 
of backfill.

Now I only find  "osd max backfills" can have effect the backfill speed. But 
after all pg need to be backfilled begin backfilling I can't find any way to 
speed up backfills.

Especailly when it comes to the last pg to recover, the speed is only a few 
MB/s(when there are multi pg are backfilled the speed could be more than 
600MB/s in my test)

I am a little confused about the setting of backfills and recovery.Though 
backfilling is a kind of recovery but It seems recovery setting is only about 
to replay pg logs to do recover  pg.

Would change "osd recovery max active" or other recovery setting have any 
effect on backfilling?

I did tried "osd recovery op priority" and "osd recovery max active" with no 
luck.

Any advice would be greatly appreciated.Thanks

2018-01-11



lin.yunfan

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to speed up backfill

2018-01-10 Thread Josef Zelenka
Hi, our recovery slowed down significantly towards the end, however it 
was still about five times faster than the original speed.We suspected 
that this is caused somehow by threading (more objects transferred - 
more threads used), but this is only an assumption.



On 11/01/18 05:02, shadow_lin wrote:

Hi,
I had tried these two method and for backfilling it seems only 
osd-max-backfills works.

How was your recovery speed when it comes to the last few pgs or objects?
2018-01-11

shadow_lin


*发件人:*Josef Zelenka 
*发送时间:*2018-01-11 04:53
*主题:*Re: [ceph-users] How to speed up backfill
*收件人:*"shadow_lin"
*抄送:*

Hi, i had the same issue a few days back, i tried playing around
with these two:

ceph tell 'osd.*' injectargs '--osd-max-backfills '
ceph tell 'osd.*' injectargs '--osd-recovery-max-active  '
  and it helped greatly(increased our recovery speed 20x), but be careful 
to not overload your systems.


On 10/01/18 17:50, shadow_lin wrote:

Hi all,
I am playing with setting for backfill to try to find how to
control the speed of backfill.
Now I only find  "osd max backfills" can have effect the backfill
speed. But after all pg need to be backfilled begin backfilling I
can't find any way to speed up backfills.
Especailly when it comes to the last pg to recover, the speed is
only a few MB/s(when there are multi pg are backfilled the speed
could be more than 600MB/s in my test)
I am a little confused about the setting of backfills and
recovery.Though backfilling is a kind of recovery but It seems
recovery setting is only about to replay pg logs to do recover  pg.
Would change "osd recovery max active" or other recovery setting
have any effect on backfilling?
I did tried "osd recovery op priority" and "osd recovery max
active" with no luck.
Any advice would be greatly appreciated.Thanks
2018-01-11

lin.yunfan


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph MGR Influx plugin 12.2.2

2018-01-10 Thread Reed Dier
Hi all,

Does anyone have any idea if the influx plugin for ceph-mgr is stable in 12.2.2?

Would love to ditch collectd and report directly from ceph if that is the case.

Documentation says that it is added in Mimic/13.x, however it looks like from 
an earlier ML post that it would be coming to Luminous.
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021302.html 


I also see it as a disabled module currently:
> $ ceph mgr module ls
> {
> "enabled_modules": [
> "dashboard",
> "restful",
> "status"
> ],
> "disabled_modules": [
> "balancer",
> "influx",
> "localpool",
> "prometheus",
> "selftest",
> "zabbix"
> ]
> }


Curious if anyone has been using it in place of CollectD/Telegraf for feeding 
InfluxDB with statistics.

Thanks,

Reed___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph 10.2.10 - SegFault in ms_pipe_read

2018-01-10 Thread Dyweni - Ceph-Users
I moved the drive from the crashing 10.2.10 OSD node into a different 
10.2.10 OSD and everything is working fine now.




On 2018-01-10 20:42, Dyweni - Ceph-Users wrote:

Hi,

My cluster has 12.2.2 Mons and Mgrs, and 10.2.10 OSDs.

I tried adding a new 12.2.2 OSD into the mix and it crashed (expected).

However, now one of my existing 10.2.10 OSDs is crashing.  I've not
had any issues with the 10.2.10 OSDs to date.

What is strange, is that both the 10.2.10 and 12.2.2 OSD crashes occur
in the ms_pipe_read thread.

Also strange, is that this crash appears to occur during recovery...
I have two other OSDs, which if on, cause this OSD to crash.  If those
OSDs are off, this OSD does not crash.  Other than for recovery, my
cluster is completely idle.

Any ideas for troubleshooting / resolving?


Thread 73 "ms_pipe_read" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x6fefea20 (LWP 3913)]
0xb6b14f08 in std::_Rb_tree_increment(std::_Rb_tree_node_base const*)
() from
/usr/lib/gcc/armv7a-hardfloat-linux-gnueabi/5.4.0/libstdc++.so.6
(gdb) bt
#0  0xb6b14f08 in std::_Rb_tree_increment(std::_Rb_tree_node_base
const*) () from
/usr/lib/gcc/armv7a-hardfloat-linux-gnueabi/5.4.0/libstdc++.so.6
#1  0x0082d5d8 in std::_Rb_tree,
std::_Select1st, std::less,
std::allocator >::_M_get_insert_hint_unique_pos(std::_Rb_tree_const_iterator, unsigned long long const&) ()
#2  0x008e32fc in std::_Rb_tree_iterator std::_Rb_tree,
std::_Select1st, std::less,
std::allocator >::_M_emplace_hint_unique, std::tuple<> 
>(std::_Rb_tree_const_iterator, std::piecewise_construct_t const&, 
std::tuple&&, std::tuple<>&&) ()
#3  0x009db0c0 in PushOp::decode(ceph::buffer::list::iterator&) ()
#4  0x009297a8 in MOSDPGPush::decode_payload() ()
#5  0x00d0d3dc in decode_message(CephContext*, int, ceph_msg_header&,
ceph_msg_footer&, ceph::buffer::list&, ceph::buffer::list&,
ceph::buffer::list&) ()
#6  0x00ea01b8 in Pipe::read_message(Message**, AuthSessionHandler*) ()
#7  0x00eaa44c in Pipe::reader() ()
#8  0x00eb2acc in Pipe::Reader::entry() ()
#9  0xb6e1a890 in start_thread () from /lib/libpthread.so.0
#10 0xb6978408 in ?? () from /lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt 
stack?)

(gdb) frame


Thanks,
Dyweni

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph 10.2.10 - SegFault in ms_pipe_read

2018-01-10 Thread Dyweni - Ceph-Users

Hi,

My cluster has 12.2.2 Mons and Mgrs, and 10.2.10 OSDs.

I tried adding a new 12.2.2 OSD into the mix and it crashed (expected).

However, now one of my existing 10.2.10 OSDs is crashing.  I've not had 
any issues with the 10.2.10 OSDs to date.


What is strange, is that both the 10.2.10 and 12.2.2 OSD crashes occur 
in the ms_pipe_read thread.


Also strange, is that this crash appears to occur during recovery...  I 
have two other OSDs, which if on, cause this OSD to crash.  If those 
OSDs are off, this OSD does not crash.  Other than for recovery, my 
cluster is completely idle.


Any ideas for troubleshooting / resolving?


Thread 73 "ms_pipe_read" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x6fefea20 (LWP 3913)]
0xb6b14f08 in std::_Rb_tree_increment(std::_Rb_tree_node_base const*) () 
from /usr/lib/gcc/armv7a-hardfloat-linux-gnueabi/5.4.0/libstdc++.so.6

(gdb) bt
#0  0xb6b14f08 in std::_Rb_tree_increment(std::_Rb_tree_node_base 
const*) () from 
/usr/lib/gcc/armv7a-hardfloat-linux-gnueabi/5.4.0/libstdc++.so.6
#1  0x0082d5d8 in std::_Rb_treelong long const, unsigned long long>, std::_Select1st >, std::less, 
std::allocator 
>::_M_get_insert_hint_unique_pos(std::_Rb_tree_const_iterator, unsigned long long const&) ()
#2  0x008e32fc in std::_Rb_tree_iterator > std::_Rb_treestd::pair, 
std::_Select1st, std::less, std::allocator >::_M_emplace_hint_unique, std::tuple<> >(std::_Rb_tree_const_iterator, std::piecewise_construct_t const&, std::tuple&&, std::tuple<>&&) ()

#3  0x009db0c0 in PushOp::decode(ceph::buffer::list::iterator&) ()
#4  0x009297a8 in MOSDPGPush::decode_payload() ()
#5  0x00d0d3dc in decode_message(CephContext*, int, ceph_msg_header&, 
ceph_msg_footer&, ceph::buffer::list&, ceph::buffer::list&, 
ceph::buffer::list&) ()

#6  0x00ea01b8 in Pipe::read_message(Message**, AuthSessionHandler*) ()
#7  0x00eaa44c in Pipe::reader() ()
#8  0x00eb2acc in Pipe::Reader::entry() ()
#9  0xb6e1a890 in start_thread () from /lib/libpthread.so.0
#10 0xb6978408 in ?? () from /lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt 
stack?)

(gdb) frame


Thanks,
Dyweni

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to speed up backfill

2018-01-10 Thread Sergey Malinin
It is also worth looking at osd_recovery_sleep option.


From: ceph-users  on behalf of Josef Zelenka 

Sent: Thursday, January 11, 2018 12:07:45 AM
To: shadow_lin
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] How to speed up backfill



On 10/01/18 21:53, Josef Zelenka wrote:

Hi, i had the same issue a few days back, i tried playing around with these two:

ceph tell 'osd.*' injectargs '--osd-max-backfills '
ceph tell 'osd.*' injectargs '--osd-recovery-max-active  '
 and it helped greatly(increased our recovery speed 20x), but be careful to not 
overload your systems.


On 10/01/18 17:50, shadow_lin wrote:
Hi all,
I am playing with setting for backfill to try to find how to control the speed 
of backfill.

Now I only find  "osd max backfills" can have effect the backfill speed. But 
after all pg need to be backfilled begin backfilling I can't find any way to 
speed up backfills.

Especailly when it comes to the last pg to recover, the speed is only a few 
MB/s(when there are multi pg are backfilled the speed could be more than 
600MB/s in my test)

I am a little confused about the setting of backfills and recovery.Though 
backfilling is a kind of recovery but It seems recovery setting is only about 
to replay pg logs to do recover  pg.

Would change "osd recovery max active" or other recovery setting have any 
effect on backfilling?

I did tried "osd recovery op priority" and "osd recovery max active" with no 
luck.

Any advice would be greatly appreciated.Thanks

2018-01-11

lin.yunfan



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-volume does not support upstart

2018-01-10 Thread 赵赵贺东
Hello,
I am sorry for the delay.
Thank you for your suggestion.

It is better to update system or keep using ceph-disk in fact. 
Thank you Alfredo Deza & Cary.


> 在 2018年1月8日,下午11:41,Alfredo Deza  写道:
> 
> ceph-volume relies on systemd, it will not work with upstart. Going
> the fstab way might work, but most of the lvm implementation will want
> to do systemd-related calls like enabling units and placing files.
> 
> For upstart you might want to keep using ceph-disk, unless upgrading
> to a newer OS is an option in which case ceph-volume would work (as
> long as systemd is available)
> 
> On Sat, Dec 30, 2017 at 9:11 PM, 赵赵贺东  wrote:
>> Hello Cary,
>> 
>> Thank you for your detailed description, it’s really helpful for me!
>> I will have a try when I get back to my office!
>> 
>> Thank you for your attention to this matter.
>> 
>> 
>> 在 2017年12月30日,上午3:51,Cary  写道:
>> 
>> Hello,
>> 
>> I mount my Bluestore OSDs in /etc/fstab:
>> 
>> vi /etc/fstab
>> 
>> tmpfs   /var/lib/ceph/osd/ceph-12  tmpfs   rw,relatime 0 0
>> =
>> Then mount everyting in fstab with:
>> mount -a
>> ==
>> I activate my OSDs this way on startup: You can find the fsid with
>> 
>> cat /var/lib/ceph/osd/ceph-12/fsid
>> 
>> Then add file named ceph.start so ceph-volume will be run at startup.
>> 
>> vi /etc/local.d/ceph.start
>> ceph-volume lvm activate 12 827f4a2c-8c1b-427b-bd6c-66d31a0468ac
>> ==
>> Make it excitable:
>> chmod 700 /etc/local.d/ceph.start
>> ==
>> cd /etc/local.d/
>> ./ceph.start
>> ==
>> I am a Gentoo user and use OpenRC, so this may not apply to you.
>> ==
>> cd /etc/init.d/
>> ln -s ceph ceph-osd.12
>> /etc/init.d/ceph-osd.12 start
>> rc-update add ceph-osd.12 default
>> 
>> Cary
>> 
>> On Fri, Dec 29, 2017 at 8:47 AM, 赵赵贺东  wrote:
>> 
>> Hello Cary!
>> It’s really big surprise for me to receive your reply!
>> Sincere thanks to you!
>> I know it’s a fake execute file, but it works!
>> 
>> >
>> $ cat /usr/sbin/systemctl
>> #!/bin/bash
>> exit 0
>> <
>> 
>> I can start my osd by following command
>> /usr/bin/ceph-osd --cluster=ceph -i 12 -f --setuser ceph --setgroup ceph
>> 
>> But, threre are still problems.
>> 1.Though ceph-osd can start successfully, prepare log and activate log looks
>> like errors occurred.
>> 
>> Prepare log:
>> ===>
>> # ceph-volume lvm prepare --bluestore --data vggroup/lv
>> Running command: sudo mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-12
>> Running command: chown -R ceph:ceph /dev/dm-0
>> Running command: sudo ln -s /dev/vggroup/lv /var/lib/ceph/osd/ceph-12/block
>> Running command: sudo ceph --cluster ceph --name client.bootstrap-osd
>> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o
>> /var/lib/ceph/osd/ceph-12/activate.monmap
>> stderr: got monmap epoch 1
>> Running command: ceph-authtool /var/lib/ceph/osd/ceph-12/keyring
>> --create-keyring --name osd.12 --add-key
>> AQAQ+UVa4z2ANRAAmmuAExQauFinuJuL6A56ww==
>> stdout: creating /var/lib/ceph/osd/ceph-12/keyring
>> stdout: added entity osd.12 auth auth(auid = 18446744073709551615
>> key=AQAQ+UVa4z2ANRAAmmuAExQauFinuJuL6A56ww== with 0 caps)
>> Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-12/keyring
>> Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-12/
>> Running command: sudo ceph-osd --cluster ceph --osd-objectstore bluestore
>> --mkfs -i 12 --monmap /var/lib/ceph/osd/ceph-12/activate.monmap --key
>>  --osd-data
>> /var/lib/ceph/osd/ceph-12/ --osd-uuid 827f4a2c-8c1b-427b-bd6c-66d31a0468ac
>> --setuser ceph --setgroup ceph
>> stderr: warning: unable to create /var/run/ceph: (13) Permission denied
>> stderr: 2017-12-29 08:13:08.609127 b66f3000 -1 asok(0x850c62a0)
>> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to
>> bind the UNIX domain socket to '/var/run/ceph/ceph-osd.12.asok': (2) No such
>> file or directory
>> stderr:
>> stderr: 2017-12-29 08:13:08.643410 b66f3000 -1
>> bluestore(/var/lib/ceph/osd/ceph-12//block) _read_bdev_label unable to
>> decode label at offset 66: buffer::malformed_input: void
>> bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past
>> end of struct encoding
>> stderr: 2017-12-29 08:13:08.644055 b66f3000 -1
>> bluestore(/var/lib/ceph/osd/ceph-12//block) _read_bdev_label unable to
>> decode label at offset 66: buffer::malformed_input: void
>> bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past
>> end of struct encoding
>> stderr: 2017-12-29 08:13:08.644722 b66f3000 -1
>> 

Re: [ceph-users] OSDs going down/up at random

2018-01-10 Thread Brad Hubbard
On Wed, Jan 10, 2018 at 8:32 PM, Mike O'Connor  wrote:
> On 10/01/2018 4:48 PM, Mike O'Connor wrote:
>> On 10/01/2018 4:24 PM, Sam Huracan wrote:
>>> Hi Mike,
>>>
>>> Could you show system log at moment osd down and up?
> So now I know its a crash, what my next step. As soon as I put the
> system under write load, OSDs start crashing.

Could be this issue (or at least related).

http://tracker.ceph.com/issues/22102

You can start by adding information about your configuration, how/when
you see the crash, and the stack trace to that tracker.

I'd also look for the more detailed log for osd12 which should give
more information.

>
> Mike
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-01-10 Thread Sean Redmond
Hi David,

Thanks for your email, they are connected inside Dell R730XD (2.5 inch 24
disk model) in None RAID mode via a perc RAID card.

The version of ceph is Jewel with kernel 4.13.X and ubuntu 16.04.

Thanks for your feedback on the HGST disks.

Thanks

On Wed, Jan 10, 2018 at 10:55 PM, David Herselman  wrote:

> Hi Sean,
>
>
>
> No, Intel’s feedback has been… Pathetic… I have yet to receive anything
> more than a request to ‘sign’ a non-disclosure agreement, to obtain beta
> firmware. No official answer as to whether or not one can logically unlock
> the drives, no answer to my question whether or not Intel publish serial
> numbers anywhere pertaining to recalled batches and no information
> pertaining to whether or not firmware updates would address any known
> issues.
>
>
>
> This with us being an accredited Intel Gold partner…
>
>
>
>
>
> We’ve returned the lot and ended up with 9/12 of the drives failing in the
> same manner. The replaced drives, which had different serial number ranges,
> also failed. Very frustrating is that the drives fail in a way that result
> in unbootable servers, unless one adds ‘rootdelay=240’ to the kernel.
>
>
>
>
>
> I would be interested to know what platform your drives were in and
> whether or not they were connected to a RAID module/card.
>
>
>
> PS: After much searching we’ve decided to order the NVMe conversion kit
> and have ordered HGST UltraStar SN200 2.5 inch SFF drives with a 3 DWPD
> rating.
>
>
>
>
>
> Regards
>
> David Herselman
>
>
>
> *From:* Sean Redmond [mailto:sean.redmo...@gmail.com]
> *Sent:* Thursday, 11 January 2018 12:45 AM
> *To:* David Herselman 
> *Cc:* Christian Balzer ; ceph-users@lists.ceph.com
>
> *Subject:* Re: [ceph-users] Many concurrent drive failures - How do I
> activate pgs?
>
>
>
> Hi,
>
>
>
> I have a case where 3 out to 12 of these Intel S4600 2TB model failed
> within a matter of days after being burn-in tested then placed into
> production.
>
>
>
> I am interested to know, did you every get any further feedback from the
> vendor on your issue?
>
>
>
> Thanks
>
>
>
> On Thu, Dec 21, 2017 at 1:38 PM, David Herselman  wrote:
>
> Hi,
>
> I assume this can only be a physical manufacturing flaw or a firmware bug?
> Do Intel publish advisories on recalled equipment? Should others be
> concerned about using Intel DC S4600 SSD drives? Could this be an
> electrical issue on the Hot Swap Backplane or BMC firmware issue? Either
> way, all pure Intel...
>
> The hole is only 1.3 GB (4 MB x 339 objects) but perfectly striped through
> images, file systems are subsequently severely damaged.
>
> Is it possible to get Ceph to read in partial data shards? It would
> provide between 25-75% more yield...
>
>
> Is there anything wrong with how we've proceeded thus far? Would be nice
> to reference examples of using ceph-objectstore-tool but documentation is
> virtually non-existent.
>
> We used another SSD drive to simulate bringing all the SSDs back online.
> We carved up the drive to provide equal partitions to essentially simulate
> the original SSDs:
>   # Partition a drive to provide 12 x 150GB partitions, eg:
> sdd   8:48   0   1.8T  0 disk
> |-sdd18:49   0   140G  0 part
> |-sdd28:50   0   140G  0 part
> |-sdd38:51   0   140G  0 part
> |-sdd48:52   0   140G  0 part
> |-sdd58:53   0   140G  0 part
> |-sdd68:54   0   140G  0 part
> |-sdd78:55   0   140G  0 part
> |-sdd88:56   0   140G  0 part
> |-sdd98:57   0   140G  0 part
> |-sdd10   8:58   0   140G  0 part
> |-sdd11   8:59   0   140G  0 part
> +-sdd12   8:60   0   140G  0 part
>
>
>   Pre-requisites:
> ceph osd set noout;
> apt-get install uuid-runtime;
>
>
>   for ID in `seq 24 35`; do
> UUID=`uuidgen`;
> OSD_SECRET=`ceph-authtool --gen-print-key`;
> DEVICE='/dev/sdd'$[$ID-23]; # 24-23 = /dev/sdd1, 35-23 = /dev/sdd12
> echo "{\"cephx_secret\": \"$OSD_SECRET\"}" | ceph osd new $UUID $ID -i
> - -n client.bootstrap-osd -k /var/lib/ceph/bootstrap-osd/ceph.keyring;
> mkdir /var/lib/ceph/osd/ceph-$ID;
> mkfs.xfs $DEVICE;
> mount $DEVICE /var/lib/ceph/osd/ceph-$ID;
> ceph-authtool --create-keyring /var/lib/ceph/osd/ceph-$ID/keyring
> --name osd.$ID --add-key $OSD_SECRET;
> ceph-osd -i $ID --mkfs --osd-uuid $UUID;
> chown -R ceph:ceph /var/lib/ceph/osd/ceph-$ID;
> systemctl enable ceph-osd@$ID;
> systemctl start ceph-osd@$ID;
>   done
>
>
> Once up we imported previous exports of empty head files in to 'real' OSDs:
>   kvm5b:
> systemctl stop ceph-osd@8;
> ceph-objectstore-tool --op import --pgid 7.4s0 --data-path
> /var/lib/ceph/osd/ceph-8 --journal-path /var/lib/ceph/osd/ceph-8/journal
> --file /var/lib/vz/template/ssd_recovery/osd8_7.4s0.export;
> chown ceph:ceph -R /var/lib/ceph/osd/ceph-8;
> systemctl start ceph-osd@8;
>   kvm5f:
> systemctl stop ceph-osd@23;
> 

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-01-10 Thread David Herselman
Hi Sean,

No, Intel’s feedback has been… Pathetic… I have yet to receive anything more 
than a request to ‘sign’ a non-disclosure agreement, to obtain beta firmware. 
No official answer as to whether or not one can logically unlock the drives, no 
answer to my question whether or not Intel publish serial numbers anywhere 
pertaining to recalled batches and no information pertaining to whether or not 
firmware updates would address any known issues.

This with us being an accredited Intel Gold partner…


We’ve returned the lot and ended up with 9/12 of the drives failing in the same 
manner. The replaced drives, which had different serial number ranges, also 
failed. Very frustrating is that the drives fail in a way that result in 
unbootable servers, unless one adds ‘rootdelay=240’ to the kernel.


I would be interested to know what platform your drives were in and whether or 
not they were connected to a RAID module/card.

PS: After much searching we’ve decided to order the NVMe conversion kit and 
have ordered HGST UltraStar SN200 2.5 inch SFF drives with a 3 DWPD rating.


Regards
David Herselman

From: Sean Redmond [mailto:sean.redmo...@gmail.com]
Sent: Thursday, 11 January 2018 12:45 AM
To: David Herselman 
Cc: Christian Balzer ; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Many concurrent drive failures - How do I activate 
pgs?

Hi,

I have a case where 3 out to 12 of these Intel S4600 2TB model failed within a 
matter of days after being burn-in tested then placed into production.

I am interested to know, did you every get any further feedback from the vendor 
on your issue?

Thanks

On Thu, Dec 21, 2017 at 1:38 PM, David Herselman 
> wrote:
Hi,

I assume this can only be a physical manufacturing flaw or a firmware bug? Do 
Intel publish advisories on recalled equipment? Should others be concerned 
about using Intel DC S4600 SSD drives? Could this be an electrical issue on the 
Hot Swap Backplane or BMC firmware issue? Either way, all pure Intel...

The hole is only 1.3 GB (4 MB x 339 objects) but perfectly striped through 
images, file systems are subsequently severely damaged.

Is it possible to get Ceph to read in partial data shards? It would provide 
between 25-75% more yield...


Is there anything wrong with how we've proceeded thus far? Would be nice to 
reference examples of using ceph-objectstore-tool but documentation is 
virtually non-existent.

We used another SSD drive to simulate bringing all the SSDs back online. We 
carved up the drive to provide equal partitions to essentially simulate the 
original SSDs:
  # Partition a drive to provide 12 x 150GB partitions, eg:
sdd   8:48   0   1.8T  0 disk
|-sdd18:49   0   140G  0 part
|-sdd28:50   0   140G  0 part
|-sdd38:51   0   140G  0 part
|-sdd48:52   0   140G  0 part
|-sdd58:53   0   140G  0 part
|-sdd68:54   0   140G  0 part
|-sdd78:55   0   140G  0 part
|-sdd88:56   0   140G  0 part
|-sdd98:57   0   140G  0 part
|-sdd10   8:58   0   140G  0 part
|-sdd11   8:59   0   140G  0 part
+-sdd12   8:60   0   140G  0 part


  Pre-requisites:
ceph osd set noout;
apt-get install uuid-runtime;


  for ID in `seq 24 35`; do
UUID=`uuidgen`;
OSD_SECRET=`ceph-authtool --gen-print-key`;
DEVICE='/dev/sdd'$[$ID-23]; # 24-23 = /dev/sdd1, 35-23 = /dev/sdd12
echo "{\"cephx_secret\": \"$OSD_SECRET\"}" | ceph osd new $UUID $ID -i - -n 
client.bootstrap-osd -k /var/lib/ceph/bootstrap-osd/ceph.keyring;
mkdir /var/lib/ceph/osd/ceph-$ID;
mkfs.xfs $DEVICE;
mount $DEVICE /var/lib/ceph/osd/ceph-$ID;
ceph-authtool --create-keyring /var/lib/ceph/osd/ceph-$ID/keyring --name 
osd.$ID --add-key $OSD_SECRET;
ceph-osd -i $ID --mkfs --osd-uuid $UUID;
chown -R ceph:ceph /var/lib/ceph/osd/ceph-$ID;
systemctl enable ceph-osd@$ID;
systemctl start ceph-osd@$ID;
  done


Once up we imported previous exports of empty head files in to 'real' OSDs:
  kvm5b:
systemctl stop ceph-osd@8;
ceph-objectstore-tool --op import --pgid 7.4s0 --data-path 
/var/lib/ceph/osd/ceph-8 --journal-path /var/lib/ceph/osd/ceph-8/journal --file 
/var/lib/vz/template/ssd_recovery/osd8_7.4s0.export;
chown ceph:ceph -R /var/lib/ceph/osd/ceph-8;
systemctl start ceph-osd@8;
  kvm5f:
systemctl stop ceph-osd@23;
ceph-objectstore-tool --op import --pgid 7.fs0 --data-path 
/var/lib/ceph/osd/ceph-23 --journal-path /var/lib/ceph/osd/ceph-23/journal 
--file /var/lib/vz/template/ssd_recovery/osd23_7.fs0.export;
chown ceph:ceph -R /var/lib/ceph/osd/ceph-23;
systemctl start ceph-osd@23;


Bulk import previously exported objects:
cd /var/lib/vz/template/ssd_recovery;
for FILE in `ls -1A osd*_*.export | grep -Pv '^osd(8|23)_'`; do
  OSD=`echo $FILE | perl -pe 's/^osd(\d+).*/\1/'`;
  PGID=`echo $FILE | perl -pe 's/^osd\d+_(.*?).export/\1/g'`;
  echo -e 

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-01-10 Thread Sean Redmond
Hi,

I have a case where 3 out to 12 of these Intel S4600 2TB model failed
within a matter of days after being burn-in tested then placed into
production.

I am interested to know, did you every get any further feedback from the
vendor on your issue?

Thanks

On Thu, Dec 21, 2017 at 1:38 PM, David Herselman  wrote:

> Hi,
>
> I assume this can only be a physical manufacturing flaw or a firmware bug?
> Do Intel publish advisories on recalled equipment? Should others be
> concerned about using Intel DC S4600 SSD drives? Could this be an
> electrical issue on the Hot Swap Backplane or BMC firmware issue? Either
> way, all pure Intel...
>
> The hole is only 1.3 GB (4 MB x 339 objects) but perfectly striped through
> images, file systems are subsequently severely damaged.
>
> Is it possible to get Ceph to read in partial data shards? It would
> provide between 25-75% more yield...
>
>
> Is there anything wrong with how we've proceeded thus far? Would be nice
> to reference examples of using ceph-objectstore-tool but documentation is
> virtually non-existent.
>
> We used another SSD drive to simulate bringing all the SSDs back online.
> We carved up the drive to provide equal partitions to essentially simulate
> the original SSDs:
>   # Partition a drive to provide 12 x 150GB partitions, eg:
> sdd   8:48   0   1.8T  0 disk
> |-sdd18:49   0   140G  0 part
> |-sdd28:50   0   140G  0 part
> |-sdd38:51   0   140G  0 part
> |-sdd48:52   0   140G  0 part
> |-sdd58:53   0   140G  0 part
> |-sdd68:54   0   140G  0 part
> |-sdd78:55   0   140G  0 part
> |-sdd88:56   0   140G  0 part
> |-sdd98:57   0   140G  0 part
> |-sdd10   8:58   0   140G  0 part
> |-sdd11   8:59   0   140G  0 part
> +-sdd12   8:60   0   140G  0 part
>
>
>   Pre-requisites:
> ceph osd set noout;
> apt-get install uuid-runtime;
>
>
>   for ID in `seq 24 35`; do
> UUID=`uuidgen`;
> OSD_SECRET=`ceph-authtool --gen-print-key`;
> DEVICE='/dev/sdd'$[$ID-23]; # 24-23 = /dev/sdd1, 35-23 = /dev/sdd12
> echo "{\"cephx_secret\": \"$OSD_SECRET\"}" | ceph osd new $UUID $ID -i
> - -n client.bootstrap-osd -k /var/lib/ceph/bootstrap-osd/ceph.keyring;
> mkdir /var/lib/ceph/osd/ceph-$ID;
> mkfs.xfs $DEVICE;
> mount $DEVICE /var/lib/ceph/osd/ceph-$ID;
> ceph-authtool --create-keyring /var/lib/ceph/osd/ceph-$ID/keyring
> --name osd.$ID --add-key $OSD_SECRET;
> ceph-osd -i $ID --mkfs --osd-uuid $UUID;
> chown -R ceph:ceph /var/lib/ceph/osd/ceph-$ID;
> systemctl enable ceph-osd@$ID;
> systemctl start ceph-osd@$ID;
>   done
>
>
> Once up we imported previous exports of empty head files in to 'real' OSDs:
>   kvm5b:
> systemctl stop ceph-osd@8;
> ceph-objectstore-tool --op import --pgid 7.4s0 --data-path
> /var/lib/ceph/osd/ceph-8 --journal-path /var/lib/ceph/osd/ceph-8/journal
> --file /var/lib/vz/template/ssd_recovery/osd8_7.4s0.export;
> chown ceph:ceph -R /var/lib/ceph/osd/ceph-8;
> systemctl start ceph-osd@8;
>   kvm5f:
> systemctl stop ceph-osd@23;
> ceph-objectstore-tool --op import --pgid 7.fs0 --data-path
> /var/lib/ceph/osd/ceph-23 --journal-path /var/lib/ceph/osd/ceph-23/journal
> --file /var/lib/vz/template/ssd_recovery/osd23_7.fs0.export;
> chown ceph:ceph -R /var/lib/ceph/osd/ceph-23;
> systemctl start ceph-osd@23;
>
>
> Bulk import previously exported objects:
> cd /var/lib/vz/template/ssd_recovery;
> for FILE in `ls -1A osd*_*.export | grep -Pv '^osd(8|23)_'`; do
>   OSD=`echo $FILE | perl -pe 's/^osd(\d+).*/\1/'`;
>   PGID=`echo $FILE | perl -pe 's/^osd\d+_(.*?).export/\1/g'`;
>   echo -e "systemctl stop ceph-osd@$OSD\t ceph-objectstore-tool --op
> import --pgid $PGID --data-path /var/lib/ceph/osd/ceph-$OSD --journal-path
> /var/lib/ceph/osd/ceph-$OSD/journal --file /var/lib/vz/template/ssd_
> recovery/osd"$OSD"_$PGID.export";
> done | sort
>
> Sample output (this will wrap):
> systemctl stop ceph-osd@27   ceph-objectstore-tool --op import --pgid
> 7.4s3 --data-path /var/lib/ceph/osd/ceph-27 --journal-path
> /var/lib/ceph/osd/ceph-27/journal --file /var/lib/vz/template/ssd_
> recovery/osd27_7.4s3.export
> systemctl stop ceph-osd@27   ceph-objectstore-tool --op import --pgid
> 7.fs5 --data-path /var/lib/ceph/osd/ceph-27 --journal-path
> /var/lib/ceph/osd/ceph-27/journal --file /var/lib/vz/template/ssd_
> recovery/osd27_7.fs5.export
> systemctl stop ceph-osd@30   ceph-objectstore-tool --op import --pgid
> 7.fs4 --data-path /var/lib/ceph/osd/ceph-30 --journal-path
> /var/lib/ceph/osd/ceph-30/journal --file /var/lib/vz/template/ssd_
> recovery/osd30_7.fs4.export
> systemctl stop ceph-osd@31   ceph-objectstore-tool --op import --pgid
> 7.4s2 --data-path /var/lib/ceph/osd/ceph-31 --journal-path
> /var/lib/ceph/osd/ceph-31/journal --file /var/lib/vz/template/ssd_
> recovery/osd31_7.4s2.export
> systemctl stop ceph-osd@32   

Re: [ceph-users] Incomplete pgs and no data movement ( cluster appears readonly )

2018-01-10 Thread Brent Kennedy
Ugh, that’s what I was hoping to avoid.  OSD 13 is still in the server, I 
wonder if I could somehow bring it back in as OSD 13 to see if it has the 
missing data.  

I was looking into using the ceph-objectstore tool, but the only instructions I 
can find online are sparse and mostly in this lists archive.  I am still trying 
to get clarification on the data itself as I was hoping to delete it, but even 
the deletion process doesn’t seem to exist.  All I can find is a force-create 
feature that seems to force recreate the pg, but again the documentation that 
is weak as well.

-Brent

  

-Original Message-
From: Gregory Farnum [mailto:gfar...@redhat.com] 
Sent: Wednesday, January 10, 2018 3:15 PM
To: Brent Kennedy 
Cc: Janne Johansson ; Ceph Users 

Subject: Re: [ceph-users] Incomplete pgs and no data movement ( cluster appears 
readonly )

On Wed, Jan 10, 2018 at 11:14 AM, Brent Kennedy  wrote:
> I adjusted “osd max pg per osd hard ratio ” to 50.0 and left “mon max 
> pg per osd” at 5000 just to see if things would allow data movement.  
> This worked, the new pool I created finished its creation and spread 
> out.  I was able to then copy the data from the existing pool into the 
> new pool and delete the old one.
>
>
>
> Used this process for copying the default pools:
>
> ceph osd pool create .users.email.new 16
>
> rados cppool .users.email .users.email.new
>
> ceph osd pool delete .users.email .users.email 
> --yes-i-really-really-mean-it
>
> ceph osd pool rename .users.email.new .users.email
>
> ceph osd pool application enable .users.email rgw
>
>
>
>
>
> So at this point, I have recreated all the .rgw and .user pools except 
> .rgw.buckets with a pg_num of 16, which significantly reduced the pgs, 
> unfortunately, the incompletes are still there:
>
>
>
>   cluster:
>
>health: HEALTH_WARN
>
> Reduced data availability: 4 pgs inactive, 4 pgs 
> incomplete
>
> Degraded data redundancy: 4 pgs unclean

There seems to have been some confusion here. From your prior thread:

On Thu, Jan 4, 2018 at 9:56 PM, Brent Kennedy  wrote:
> We have upgraded from Hammer to Jewel and then Luminous 12.2.2 as of today.
> During the hammer upgrade to Jewel we lost two host servers

So, if you have size two, and you lose two servers before the data has finished 
recovering...you've lost data. And that is indeed what "incomplete" means: the 
PG thinks writes may have happened, but the OSDs which held the data at that 
time aren't available. You'll need to dive into doing PG recovery with the 
ceph-objectstore tool and things, or find one of the groups that does 
consulting around recovery.
-Greg

>
>
>
>   services:
>
> mon: 3 daemons, quorum mon1,mon2,mon3
>
> mgr: mon3(active), standbys: mon1, mon2
>
> osd: 43 osds: 43 up, 43 in
>
>
>
>   data:
>
> pools:   10 pools, 4240 pgs
>
> objects: 8148k objects, 10486 GB
>
> usage:   21536 GB used, 135 TB / 156 TB avail
>
> pgs: 0.094% pgs not active
>
>  4236 active+clean
>
>  4incomplete
>
>
>
> The health page is showing blue instead of read on the donut chart, at 
> one point it jumped to green but its back to blue currently.  There 
> are no more ops blocked/delayed either.
>
>
>
> Thanks for assistance, it seems the cluster will play nice now.  Any 
> thoughts on the stuck pgs?  I ran a query on 11.720 and it shows:
>
> "blocked_by": [
>
> 13,
>
> 27,
>
> 28
>
>
>
> OSD 13 was acting strange so I wiped it and removed it from the cluster.
> This was during the rebuild so I wasn’t aware of it blocking.  Now I 
> am trying to figure out how a removed OSD is blocking.  I went through 
> the process to remove it:
>
> ceph osd crush remove
>
> ceph auth del
>
> ceph osd rm
>
>
>
> I guess since the cluster was a hot mess at that point, its possible 
> it was borked and therefore the pg is borked.  I am trying to avoid 
> deleting the data as there is data in the OSDs that are online.
>
>
>
> -Brent
>
>
>
>
>
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
> Of Brent Kennedy
> Sent: Wednesday, January 10, 2018 12:20 PM
> To: 'Janne Johansson' 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Incomplete pgs and no data movement ( 
> cluster appears readonly )
>
>
>
> I change “mon max pg per osd” to 5000 because when I changed it to 
> zero, which was supposed to disable it, it caused an issue where I 
> couldn’t create any pools.  It would say 0 was larger than the 
> minimum.  I imagine that’s a bug, if I wanted it disabled, then it 
> shouldn’t use the calculation.  I then set “osd max pg per osd hard 
> ratio ” to 5 after changing “mon max pg per osd” to 5000, figuring 
> 5*5000 would cover it.  Perhaps not.  I will adjust it to 30 and restart the 
> OSDs.
>
>
>
> 

Re: [ceph-users] How to speed up backfill

2018-01-10 Thread Josef Zelenka



On 10/01/18 21:53, Josef Zelenka wrote:


Hi, i had the same issue a few days back, i tried playing around with 
these two:


ceph tell 'osd.*' injectargs '--osd-max-backfills '
ceph tell 'osd.*' injectargs '--osd-recovery-max-active  '
  and it helped greatly(increased our recovery speed 20x), but be careful to 
not overload your systems.

On 10/01/18 17:50, shadow_lin wrote:

Hi all,
I am playing with setting for backfill to try to find how to control 
the speed of backfill.
Now I only find  "osd max backfills" can have effect the backfill 
speed. But after all pg need to be backfilled begin backfilling I 
can't find any way to speed up backfills.
Especailly when it comes to the last pg to recover, the speed is only 
a few MB/s(when there are multi pg are backfilled the speed could be 
more than 600MB/s in my test)
I am a little confused about the setting of backfills and 
recovery.Though backfilling is a kind of recovery but It seems 
recovery setting is only about to replay pg logs to do recover  pg.
Would change "osd recovery max active" or other recovery setting have 
any effect on backfilling?
I did tried "osd recovery op priority" and "osd recovery max active" 
with no luck.

Any advice would be greatly appreciated.Thanks
2018-01-11

lin.yunfan


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cluster crash - FAILED assert(interval.last > last)

2018-01-10 Thread Josef Zelenka
Hi, today we had a disasterous crash - we are running a 3 node, 24 osd 
in total cluster (8 each) with SSDs for blockdb, HDD for bluestore data. 
This cluster is used as a radosgw backend, for storing a big number of 
thumbnails for a file hosting site - around 110m files in total. We were 
adding an interface to the nodes which required a restart, but after 
restarting one of the nodes, a lot of the OSDs were kicked out of the 
cluster and rgw stopped working. We have a lot of pgs down and unfound 
atm. OSDs can't be started(aside from some, that's a mystery) with this 
error - FAILED assert ( interval.last > last) - they just periodically 
restart. So far, the cluster is broken and we can't seem to bring it 
back up. We tried fscking the osds via the ceph objectstore tool, but it 
was no good. The root of all this seems to be in the FAILED 
assert(interval.last > last) error, however i can't find any info 
regarding this or how to fix it. Did someone here also encounter it? 
We're running luminous on ubuntu 16.04.


Thanks

Josef Zelenka

Cloudevelops

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Incomplete pgs and no data movement ( cluster appears readonly )

2018-01-10 Thread Gregory Farnum
On Wed, Jan 10, 2018 at 11:14 AM, Brent Kennedy  wrote:
> I adjusted “osd max pg per osd hard ratio ” to 50.0 and left “mon max pg per
> osd” at 5000 just to see if things would allow data movement.  This worked,
> the new pool I created finished its creation and spread out.  I was able to
> then copy the data from the existing pool into the new pool and delete the
> old one.
>
>
>
> Used this process for copying the default pools:
>
> ceph osd pool create .users.email.new 16
>
> rados cppool .users.email .users.email.new
>
> ceph osd pool delete .users.email .users.email --yes-i-really-really-mean-it
>
> ceph osd pool rename .users.email.new .users.email
>
> ceph osd pool application enable .users.email rgw
>
>
>
>
>
> So at this point, I have recreated all the .rgw and .user pools except
> .rgw.buckets with a pg_num of 16, which significantly reduced the pgs,
> unfortunately, the incompletes are still there:
>
>
>
>   cluster:
>
>health: HEALTH_WARN
>
> Reduced data availability: 4 pgs inactive, 4 pgs incomplete
>
> Degraded data redundancy: 4 pgs unclean

There seems to have been some confusion here. From your prior thread:

On Thu, Jan 4, 2018 at 9:56 PM, Brent Kennedy  wrote:
> We have upgraded from Hammer to Jewel and then Luminous 12.2.2 as of today.
> During the hammer upgrade to Jewel we lost two host servers

So, if you have size two, and you lose two servers before the data has
finished recovering...you've lost data. And that is indeed what
"incomplete" means: the PG thinks writes may have happened, but the
OSDs which held the data at that time aren't available. You'll need to
dive into doing PG recovery with the ceph-objectstore tool and things,
or find one of the groups that does consulting around recovery.
-Greg

>
>
>
>   services:
>
> mon: 3 daemons, quorum mon1,mon2,mon3
>
> mgr: mon3(active), standbys: mon1, mon2
>
> osd: 43 osds: 43 up, 43 in
>
>
>
>   data:
>
> pools:   10 pools, 4240 pgs
>
> objects: 8148k objects, 10486 GB
>
> usage:   21536 GB used, 135 TB / 156 TB avail
>
> pgs: 0.094% pgs not active
>
>  4236 active+clean
>
>  4incomplete
>
>
>
> The health page is showing blue instead of read on the donut chart, at one
> point it jumped to green but its back to blue currently.  There are no more
> ops blocked/delayed either.
>
>
>
> Thanks for assistance, it seems the cluster will play nice now.  Any
> thoughts on the stuck pgs?  I ran a query on 11.720 and it shows:
>
> "blocked_by": [
>
> 13,
>
> 27,
>
> 28
>
>
>
> OSD 13 was acting strange so I wiped it and removed it from the cluster.
> This was during the rebuild so I wasn’t aware of it blocking.  Now I am
> trying to figure out how a removed OSD is blocking.  I went through the
> process to remove it:
>
> ceph osd crush remove
>
> ceph auth del
>
> ceph osd rm
>
>
>
> I guess since the cluster was a hot mess at that point, its possible it was
> borked and therefore the pg is borked.  I am trying to avoid deleting the
> data as there is data in the OSDs that are online.
>
>
>
> -Brent
>
>
>
>
>
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Brent Kennedy
> Sent: Wednesday, January 10, 2018 12:20 PM
> To: 'Janne Johansson' 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Incomplete pgs and no data movement ( cluster
> appears readonly )
>
>
>
> I change “mon max pg per osd” to 5000 because when I changed it to zero,
> which was supposed to disable it, it caused an issue where I couldn’t create
> any pools.  It would say 0 was larger than the minimum.  I imagine that’s a
> bug, if I wanted it disabled, then it shouldn’t use the calculation.  I then
> set “osd max pg per osd hard ratio ” to 5 after changing “mon max pg per
> osd” to 5000, figuring 5*5000 would cover it.  Perhaps not.  I will adjust
> it to 30 and restart the OSDs.
>
>
>
> -Brent
>
>
>
>
>
>
>
> From: Janne Johansson [mailto:icepic...@gmail.com]
> Sent: Wednesday, January 10, 2018 3:00 AM
> To: Brent Kennedy 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Incomplete pgs and no data movement ( cluster
> appears readonly )
>
>
>
>
>
>
>
> 2018-01-10 8:51 GMT+01:00 Brent Kennedy :
>
> As per a previous thread, my pgs are set too high.  I tried adjusting the
> “mon max pg per osd” up higher and higher, which did clear the
> error(restarted monitors and managers each time), but it seems that data
> simply wont move around the cluster.  If I stop the primary OSD of an
> incomplete pg, the cluster just shows those affected pages as
> active+undersized+degraded:
>
>
>
> I also adjusted “osd max pg per osd hard ratio ” to 5, but that didn’t seem
> to trigger any data moved.  I did restart the OSDs each time I changed it.
> The data just wont finish moving.  “ceph –w” shows this:
>

Re: [ceph-users] luminous: HEALTH_ERR full ratio(s) out of order

2018-01-10 Thread Sage Weil
On Wed, 10 Jan 2018, Stefan Priebe - Profihost AG wrote:
> Am 10.01.2018 um 16:38 schrieb Sage Weil:
> > On Wed, 10 Jan 2018, John Spray wrote:
> >> On Wed, Jan 10, 2018 at 2:11 PM, Stefan Priebe - Profihost AG
> >>  wrote:
> >>> Hello,
> >>>
> >>> since upgrading to luminous i get the following error:
> >>>
> >>> HEALTH_ERR full ratio(s) out of order
> >>> OSD_OUT_OF_ORDER_FULL full ratio(s) out of order
> >>> backfillfull_ratio (0.9) < nearfull_ratio (0.95), increased
> >>>
> >>> but ceph.conf has:
> >>>
> >>> mon_osd_full_ratio = .97
> >>> mon_osd_nearfull_ratio = .95
> >>> mon_osd_backfillfull_ratio = .96
> >>> osd_backfill_full_ratio = .96
> >>> osd_failsafe_full_ratio = .98
> >>>
> >>> Any ideas?  i already restarted:
> >>> * all osds
> >>> * all mons
> >>> * all mgrs
> >>
> >> Perhaps your options are in the wrong section of your ceph.conf?  They
> >> should be in [mon] or [global] -- sometimes these end up mistakenly in
> >> [osd].
> > 
> > The other thing is that only teh osd_failsafe_full_ratio is a runtime 
> > option now (for the osd); the other ones only affect the mon during the 
> > mkfs stage.  The real thresholds are now stored in the OSDMap itself (see 
> > 'ceph osd dump | grep full') and can be modified with
> > 
> >  ceph osd set-backfillfull-ratio 
> >  ceph osd set-full-ratio 
> >  ceph osd set-nearfull-ratio 
> 
> ah thanks! That fixed it.
> 
> So i can remove all those settings from ceph.conf?

Yep!

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous: HEALTH_ERR full ratio(s) out of order

2018-01-10 Thread Stefan Priebe - Profihost AG
Am 10.01.2018 um 16:38 schrieb Sage Weil:
> On Wed, 10 Jan 2018, John Spray wrote:
>> On Wed, Jan 10, 2018 at 2:11 PM, Stefan Priebe - Profihost AG
>>  wrote:
>>> Hello,
>>>
>>> since upgrading to luminous i get the following error:
>>>
>>> HEALTH_ERR full ratio(s) out of order
>>> OSD_OUT_OF_ORDER_FULL full ratio(s) out of order
>>> backfillfull_ratio (0.9) < nearfull_ratio (0.95), increased
>>>
>>> but ceph.conf has:
>>>
>>> mon_osd_full_ratio = .97
>>> mon_osd_nearfull_ratio = .95
>>> mon_osd_backfillfull_ratio = .96
>>> osd_backfill_full_ratio = .96
>>> osd_failsafe_full_ratio = .98
>>>
>>> Any ideas?  i already restarted:
>>> * all osds
>>> * all mons
>>> * all mgrs
>>
>> Perhaps your options are in the wrong section of your ceph.conf?  They
>> should be in [mon] or [global] -- sometimes these end up mistakenly in
>> [osd].
> 
> The other thing is that only teh osd_failsafe_full_ratio is a runtime 
> option now (for the osd); the other ones only affect the mon during the 
> mkfs stage.  The real thresholds are now stored in the OSDMap itself (see 
> 'ceph osd dump | grep full') and can be modified with
> 
>  ceph osd set-backfillfull-ratio 
>  ceph osd set-full-ratio 
>  ceph osd set-nearfull-ratio 

ah thanks! That fixed it.

So i can remove all those settings from ceph.conf?

> 
> sage
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] issue adding OSDs

2018-01-10 Thread Luis Periquito
Hi,

I'm running a cluster with 12.2.1 and adding more OSDs to it.
Everything is running version 12.2.1 and require_osd is set to
luminous.

one of the pools is replicated with size 2 min_size 1, and is
seemingly blocking IO while recovering. I have no slow requests,
looking at the output of "ceph osd perf" it seems brilliant (all
numbers are lower than 10).

clients are RBD (OpenStack VM in KVM) and using (mostly) 10.2.7. I've
tagged those OSDs as out and the RBD just came back to life. I did
have some objects degraded:

2018-01-10 18:23:52.081957 mon.mon0 mon.0 x.x.x.x:6789/0 410414 :
cluster [WRN] Health check update: 9926354/49526500 objects misplaced
(20.043%) (OBJECT_MISPLACED)
2018-01-10 18:23:52.081969 mon.mon0 mon.0 x.x.x.x:6789/0 410415 :
cluster [WRN] Health check update: Degraded data redundancy:
5027/49526500 objects degraded (0.010%), 1761 pgs unclean, 27 pgs
degraded (PG_DEGRADED)

any thoughts as to what might be happening? I've run such operations
many a times...

thanks for all help, as I'm grasping as to figure out what's happening...
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Incomplete pgs and no data movement ( cluster appears readonly )

2018-01-10 Thread Brent Kennedy
I adjusted “osd max pg per osd hard ratio ” to 50.0 and left “mon max pg per 
osd” at 5000 just to see if things would allow data movement.  This worked, the 
new pool I created finished its creation and spread out.  I was able to then 
copy the data from the existing pool into the new pool and delete the old one.  

 

Used this process for copying the default pools:

ceph osd pool create .users.email.new 16

rados cppool .users.email .users.email.new

ceph osd pool delete .users.email .users.email --yes-i-really-really-mean-it

ceph osd pool rename .users.email.new .users.email

ceph osd pool application enable .users.email rgw

 

 

So at this point, I have recreated all the .rgw and .user pools except 
.rgw.buckets with a pg_num of 16, which significantly reduced the pgs, 
unfortunately, the incompletes are still there:

 

  cluster:

   health: HEALTH_WARN

Reduced data availability: 4 pgs inactive, 4 pgs incomplete

Degraded data redundancy: 4 pgs unclean

 

  services:

mon: 3 daemons, quorum mon1,mon2,mon3

mgr: mon3(active), standbys: mon1, mon2

osd: 43 osds: 43 up, 43 in

 

  data:

pools:   10 pools, 4240 pgs

objects: 8148k objects, 10486 GB

usage:   21536 GB used, 135 TB / 156 TB avail

pgs: 0.094% pgs not active

 4236 active+clean

 4incomplete

 

The health page is showing blue instead of read on the donut chart, at one 
point it jumped to green but its back to blue currently.  There are no more ops 
blocked/delayed either. 

 

Thanks for assistance, it seems the cluster will play nice now.  Any thoughts 
on the stuck pgs?  I ran a query on 11.720 and it shows:

"blocked_by": [

13,

27,

28

 

OSD 13 was acting strange so I wiped it and removed it from the cluster.  This 
was during the rebuild so I wasn’t aware of it blocking.  Now I am trying to 
figure out how a removed OSD is blocking.  I went through the process to remove 
it:

ceph osd crush remove

ceph auth del

ceph osd rm

 

I guess since the cluster was a hot mess at that point, its possible it was 
borked and therefore the pg is borked.  I am trying to avoid deleting the data 
as there is data in the OSDs that are online.

 

-Brent

 

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Brent 
Kennedy
Sent: Wednesday, January 10, 2018 12:20 PM
To: 'Janne Johansson' 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Incomplete pgs and no data movement ( cluster appears 
readonly )

 

I change “mon max pg per osd” to 5000 because when I changed it to zero, which 
was supposed to disable it, it caused an issue where I couldn’t create any 
pools.  It would say 0 was larger than the minimum.  I imagine that’s a bug, if 
I wanted it disabled, then it shouldn’t use the calculation.  I then set “osd 
max pg per osd hard ratio ” to 5 after changing “mon max pg per osd” to 5000, 
figuring 5*5000 would cover it.  Perhaps not.  I will adjust it to 30 and 
restart the OSDs.

 

-Brent

 

 

 

From: Janne Johansson [mailto:icepic...@gmail.com] 
Sent: Wednesday, January 10, 2018 3:00 AM
To: Brent Kennedy  >
Cc: ceph-users@lists.ceph.com  
Subject: Re: [ceph-users] Incomplete pgs and no data movement ( cluster appears 
readonly )

 

 

 

2018-01-10 8:51 GMT+01:00 Brent Kennedy  >:

As per a previous thread, my pgs are set too high.  I tried adjusting the “mon 
max pg per osd” up higher and higher, which did clear the error(restarted 
monitors and managers each time), but it seems that data simply wont move 
around the cluster.  If I stop the primary OSD of an incomplete pg, the cluster 
just shows those affected pages as active+undersized+degraded:

 

I also adjusted “osd max pg per osd hard ratio ” to 5, but that didn’t seem to 
trigger any data moved.  I did restart the OSDs each time I changed it.  The 
data just wont finish moving.  “ceph –w” shows this:

2018-01-10 07:49:27.715163 osd.20 [WRN] slow request 960.675164 seconds old, 
received at 2018-01-10 07:33:27.039907: osd_op(client.3542508.0:4097 14.0 
14.50e8d0b0 (undecoded) ondisk+write+known_if_redirected e125984) currently 
queued_for_pg

 

 

Did you bump the ratio so that the PGs per OSD max * hard ratio actually became 
more than the amount of PGs you had?

Last time you mailed the ratio was 25xx and the max was 200 which meant the 
ratio would have needed to be far more than 5.0.

 

 

-- 

May the most significant bit of your life be positive.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to "reset" rgw?

2018-01-10 Thread Casey Bodley


On 01/10/2018 04:34 AM, Martin Emrich wrote:

Hi!

As I cannot find any solution for my broken rgw pools, the only way 
out is to give up and "reset".


How do I throw away all rgw data from a ceph cluster? Just delete all 
rgw pools? Or are some parts stored elsewhere (monitor, ...)?


Thanks,

Martin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Deleting all of rgw's pools should be sufficient.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Incomplete pgs and no data movement ( cluster appears readonly )

2018-01-10 Thread Brent Kennedy
I change “mon max pg per osd” to 5000 because when I changed it to zero, which 
was supposed to disable it, it caused an issue where I couldn’t create any 
pools.  It would say 0 was larger than the minimum.  I imagine that’s a bug, if 
I wanted it disabled, then it shouldn’t use the calculation.  I then set “osd 
max pg per osd hard ratio ” to 5 after changing “mon max pg per osd” to 5000, 
figuring 5*5000 would cover it.  Perhaps not.  I will adjust it to 30 and 
restart the OSDs.

 

-Brent

 

 

 

From: Janne Johansson [mailto:icepic...@gmail.com] 
Sent: Wednesday, January 10, 2018 3:00 AM
To: Brent Kennedy 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Incomplete pgs and no data movement ( cluster appears 
readonly )

 

 

 

2018-01-10 8:51 GMT+01:00 Brent Kennedy  >:

As per a previous thread, my pgs are set too high.  I tried adjusting the “mon 
max pg per osd” up higher and higher, which did clear the error(restarted 
monitors and managers each time), but it seems that data simply wont move 
around the cluster.  If I stop the primary OSD of an incomplete pg, the cluster 
just shows those affected pages as active+undersized+degraded:

 

I also adjusted “osd max pg per osd hard ratio ” to 5, but that didn’t seem to 
trigger any data moved.  I did restart the OSDs each time I changed it.  The 
data just wont finish moving.  “ceph –w” shows this:

2018-01-10 07:49:27.715163 osd.20 [WRN] slow request 960.675164 seconds old, 
received at 2018-01-10 07:33:27.039907: osd_op(client.3542508.0:4097 14.0 
14.50e8d0b0 (undecoded) ondisk+write+known_if_redirected e125984) currently 
queued_for_pg

 

 

Did you bump the ratio so that the PGs per OSD max * hard ratio actually became 
more than the amount of PGs you had?

Last time you mailed the ratio was 25xx and the max was 200 which meant the 
ratio would have needed to be far more than 5.0.

 

 

-- 

May the most significant bit of your life be positive.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to speed up backfill

2018-01-10 Thread shadow_lin
Hi all,
I am playing with setting for backfill to try to find how to control the speed 
of backfill.

Now I only find  "osd max backfills" can have effect the backfill speed. But 
after all pg need to be backfilled begin backfilling I can't find any way to 
speed up backfills.

Especailly when it comes to the last pg to recover, the speed is only a few 
MB/s(when there are multi pg are backfilled the speed could be more than 
600MB/s in my test)

I am a little confused about the setting of backfills and recovery.Though 
backfilling is a kind of recovery but It seems recovery setting is only about 
to replay pg logs to do recover  pg.

Would change "osd recovery max active" or other recovery setting have any 
effect on backfilling?

I did tried "osd recovery op priority" and "osd recovery max active" with no 
luck.

Any advice would be greatly appreciated.Thanks

2018-01-11



lin.yunfan___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Open Compute (OCP) servers for Ceph

2018-01-10 Thread Wes Dillingham
Not OCP but regarding 12 3.5 drives in 1U with Decent CPU QCT makes the
following:
https://www.qct.io/product/index/Server/rackmount-server/1U-Rackmount-Server/QuantaGrid-S51G-1UL
and have a few other models with some additional SSD included in addition
to the 3.5"

Both of those compared here:
https://www.qct.io/product/compare?model=1323,215

QCT does manufacture quite a few OCP models:
this may fit the mold:
https://www.qct.io/product/index/Rack/Rackgo-X-RSD/Rackgo-X-RSD-Storage/Rackgo-X-RSD-Knoxville#specifications

On Fri, Dec 22, 2017 at 9:22 AM, Wido den Hollander  wrote:

>
>
> On 12/22/2017 02:40 PM, Dan van der Ster wrote:
>
>> Hi Wido,
>>
>> We have used a few racks of Wiwynn OCP servers in a Ceph cluster for a
>> couple of years.
>> The machines are dual Xeon [1] and use some of those 2U 30-disk "Knox"
>> enclosures.
>>
>>
> Yes, I see. I was looking for a solution without a JBOD and about 12
> drives 3.5" or ~20 2.5" in 1U with a decent CPU to run OSDs on.
>
> Other than that, I have nothing particularly interesting to say about
>> these. Our data centre procurement team have also moved on with
>> standard racked equipment, so I suppose they also found these
>> uninteresting.
>>
>>
> It really depends. When properly deployed OCP can seriously lower power
> costs for numerous reasons and thus lower the TCO of a Ceph cluster.
>
> But I dislike the machines with a lot of disks for Ceph, I prefer smaller
> machines.
>
> Hopefully somebody knows a vendor who makes such OCP machines.
>
> Wido
>
>
> Cheers, Dan
>>
>> [1] http://www.wiwynn.com/english/product/type/details/32?ptype=28
>>
>>
>> On Fri, Dec 22, 2017 at 12:04 PM, Wido den Hollander 
>> wrote:
>>
>>> Hi,
>>>
>>> I'm looking at OCP [0] servers for Ceph and I'm not able to find yet what
>>> I'm looking for.
>>>
>>> First of all, the geek in me loves OCP and the design :-) Now I'm trying
>>> to
>>> match it with Ceph.
>>>
>>> Looking at wiwynn [1] they offer a few OCP servers:
>>>
>>> - 3 nodes in 2U with a single 3.5" disk [2]
>>> - 2U node with 30 disks and a Atom C2000 [3]
>>> - 2U JDOD with 12G SAS [4]
>>>
>>> For Ceph I would want:
>>>
>>> - 1U node / 12x 3.5" / Fast CPU
>>> - 1U node / 24x 2.5" / Fast CPU
>>>
>>> They don't seem to exist yet when looking for OCP server.
>>>
>>> Although 30 drives is fine, it would become a very large Ceph cluster
>>> when
>>> building with something like that.
>>>
>>> Has anybody build Ceph clusters yet using OCP hardaware? If so, which
>>> vendor
>>> and what are your experiences?
>>>
>>> Thanks!
>>>
>>> Wido
>>>
>>> [0]: http://www.opencompute.org/
>>> [1]: http://www.wiwynn.com/
>>> [2]: http://www.wiwynn.com/english/product/type/details/65?ptype=28
>>> [3]: http://www.wiwynn.com/english/product/type/details/33?ptype=28
>>> [4]: http://www.wiwynn.com/english/product/type/details/43?ptype=28
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Respectfully,

Wes Dillingham
wes_dilling...@harvard.edu
Research Computing | Senior CyberInfrastructure Storage Engineer
Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 204
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bad crc causing osd hang and block all request.

2018-01-10 Thread shadow_lin
Thanks for your advice
I rebuilt the osd and haven't have this happened again.So it could be 
corruption on the hdds.

2018-01-11 


lin.yunfan



发件人:Konstantin Shalygin 
发送时间:2018-01-09 12:11
主题:Re: [ceph-users] Bad crc causing osd hang and block all request.
收件人:"ceph-users"
抄送:

> What could cause this problem?Is this caused by a faulty HDD? 
> what data's crc didn't match ? 

This may be caused due faulty drive. Check your dmesg. ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Changing device-class using crushtool

2018-01-10 Thread Wido den Hollander

Hi,

Is there a way to easily modify the device-class of devices on a offline 
CRUSHMap?


I know I can decompile the CRUSHMap and do it, but that's a lot of work 
in a large environment.


In larger environments I'm a fan of downloading the CRUSHMap, modifying 
it to my needs, testing it and injecting it at once into the cluster.


crushtool can do a lot, you can also run tests using device classes, but 
there doesn't seem to be a way to modify the device-class using 
crushtool, is that correct?


Wido
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous: HEALTH_ERR full ratio(s) out of order

2018-01-10 Thread Sage Weil
On Wed, 10 Jan 2018, John Spray wrote:
> On Wed, Jan 10, 2018 at 2:11 PM, Stefan Priebe - Profihost AG
>  wrote:
> > Hello,
> >
> > since upgrading to luminous i get the following error:
> >
> > HEALTH_ERR full ratio(s) out of order
> > OSD_OUT_OF_ORDER_FULL full ratio(s) out of order
> > backfillfull_ratio (0.9) < nearfull_ratio (0.95), increased
> >
> > but ceph.conf has:
> >
> > mon_osd_full_ratio = .97
> > mon_osd_nearfull_ratio = .95
> > mon_osd_backfillfull_ratio = .96
> > osd_backfill_full_ratio = .96
> > osd_failsafe_full_ratio = .98
> >
> > Any ideas?  i already restarted:
> > * all osds
> > * all mons
> > * all mgrs
> 
> Perhaps your options are in the wrong section of your ceph.conf?  They
> should be in [mon] or [global] -- sometimes these end up mistakenly in
> [osd].

The other thing is that only teh osd_failsafe_full_ratio is a runtime 
option now (for the osd); the other ones only affect the mon during the 
mkfs stage.  The real thresholds are now stored in the OSDMap itself (see 
'ceph osd dump | grep full') and can be modified with

 ceph osd set-backfillfull-ratio 
 ceph osd set-full-ratio 
 ceph osd set-nearfull-ratio 

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous: HEALTH_ERR full ratio(s) out of order

2018-01-10 Thread John Spray
On Wed, Jan 10, 2018 at 2:11 PM, Stefan Priebe - Profihost AG
 wrote:
> Hello,
>
> since upgrading to luminous i get the following error:
>
> HEALTH_ERR full ratio(s) out of order
> OSD_OUT_OF_ORDER_FULL full ratio(s) out of order
> backfillfull_ratio (0.9) < nearfull_ratio (0.95), increased
>
> but ceph.conf has:
>
> mon_osd_full_ratio = .97
> mon_osd_nearfull_ratio = .95
> mon_osd_backfillfull_ratio = .96
> osd_backfill_full_ratio = .96
> osd_failsafe_full_ratio = .98
>
> Any ideas?  i already restarted:
> * all osds
> * all mons
> * all mgrs

Perhaps your options are in the wrong section of your ceph.conf?  They
should be in [mon] or [global] -- sometimes these end up mistakenly in
[osd].

John

> Greets,
> Stefan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous: HEALTH_ERR full ratio(s) out of order

2018-01-10 Thread David Turner
Why oh why would you run with such lean settings? You very well might not
be able to recover your cluster if something happened while you were at 94%
full without even a nearfull warning on anything. Nearfull should at least
be brought down as it's just a warning in ceph's output to tell you to get
more storage in before it's too late. If you wait until your disks are 95%
full before the alert pops up telling you to order new hardware... you'll
never get it in time. And if you're monitoring to add more hardware at a
lower percentage already... why not lower the nearfull anyway just for the
extra reminder that you're filling up? Nearfull literally does nothing
other than a health_warn state.

But what if you have hardware failures while your cluster is full? What is
likely to happen with these settings is that your OSDs all get
backfill_full and can't shift data to add the new storage. Maybe you're
just testing these settings or this is a test cluster, but settings
anywhere near these ratios are terrible for production.

On Wed, Jan 10, 2018 at 10:15 AM Webert de Souza Lima 
wrote:

> Good to know. I don't think this should trigger HEALTH_ERR though, but
> HEALTH_WARN makes sense.
> It makes sense to keep the backfillfull_ratio greater than nearfull_ratio
> as one might need backfilling to avoid OSD getting full on reweight
> operations.
>
>
> Regards,
>
> Webert Lima
> DevOps Engineer at MAV Tecnologia
> *Belo Horizonte - Brasil*
> *IRC NICK - WebertRLZ*
>
> On Wed, Jan 10, 2018 at 12:11 PM, Stefan Priebe - Profihost AG <
> s.pri...@profihost.ag> wrote:
>
>> Hello,
>>
>> since upgrading to luminous i get the following error:
>>
>> HEALTH_ERR full ratio(s) out of order
>> OSD_OUT_OF_ORDER_FULL full ratio(s) out of order
>> backfillfull_ratio (0.9) < nearfull_ratio (0.95), increased
>>
>> but ceph.conf has:
>>
>> mon_osd_full_ratio = .97
>> mon_osd_nearfull_ratio = .95
>> mon_osd_backfillfull_ratio = .96
>> osd_backfill_full_ratio = .96
>> osd_failsafe_full_ratio = .98
>>
>> Any ideas?  i already restarted:
>> * all osds
>> * all mons
>> * all mgrs
>>
>> Greets,
>> Stefan
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous: HEALTH_ERR full ratio(s) out of order

2018-01-10 Thread Webert de Souza Lima
Good to know. I don't think this should trigger HEALTH_ERR though, but
HEALTH_WARN makes sense.
It makes sense to keep the backfillfull_ratio greater than nearfull_ratio
as one might need backfilling to avoid OSD getting full on reweight
operations.


Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*

On Wed, Jan 10, 2018 at 12:11 PM, Stefan Priebe - Profihost AG <
s.pri...@profihost.ag> wrote:

> Hello,
>
> since upgrading to luminous i get the following error:
>
> HEALTH_ERR full ratio(s) out of order
> OSD_OUT_OF_ORDER_FULL full ratio(s) out of order
> backfillfull_ratio (0.9) < nearfull_ratio (0.95), increased
>
> but ceph.conf has:
>
> mon_osd_full_ratio = .97
> mon_osd_nearfull_ratio = .95
> mon_osd_backfillfull_ratio = .96
> osd_backfill_full_ratio = .96
> osd_failsafe_full_ratio = .98
>
> Any ideas?  i already restarted:
> * all osds
> * all mons
> * all mgrs
>
> Greets,
> Stefan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 'lost' cephfs filesystem?

2018-01-10 Thread Webert de Souza Lima
On Wed, Jan 10, 2018 at 12:44 PM, Mark Schouten  wrote:

> > Thanks, that's a good suggestion. Just one question, will this affect
> RBD-
> > access from the same (client)host?


i'm sorry that this didn't help. No, it does not affect rbd clients, as MDS
is related only to cephfs.

Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*

>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 'lost' cephfs filesystem?

2018-01-10 Thread Mark Schouten
On woensdag 10 januari 2018 14:15:19 CET Mark Schouten wrote:
> On woensdag 10 januari 2018 08:42:04 CET Webert de Souza Lima wrote:
> > try to kick out (evict) that cephfs client from the mds node, see
> > http://docs.ceph.com/docs/master/cephfs/eviction/
> 
> Thanks, that's a good suggestion. Just one question, will this affect RBD-
> access from the same (client)host?

I tried this, it doesn't help. Thanks though!

-- 
Kerio Operator in de Cloud? https://www.kerioindecloud.nl/
Mark Schouten  | Tuxis Internet Engineering
KvK: 61527076  | http://www.tuxis.nl/
T: 0318 200208 | i...@tuxis.nl

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 'lost' cephfs filesystem?

2018-01-10 Thread Yan, Zheng
On Wed, Jan 10, 2018 at 10:59 AM, Mark Schouten  wrote:
> Hi,
>
> While upgrading a server with a CephFS mount tonight, it stalled on installing
> a new kernel, because it was waiting for `sync`.
>
> I'm pretty sure it has something to do with the CephFS filesystem which caused
> some issues last week. I think the kernel still has a reference to the
> probably lazy unmounted CephFS filesystem.
> Unmounting the filesystem 'works', which means it is no longer available, but
> the unmount-command seems to be waiting for sync() as well. Mounting the
> filesystem again doesn't work either.
>
> I know the simple solution is to just reboot the server, but the server holds
> quite a lot of VM's and Containers, so I'd prefer to fix this without a 
> reboot.
>
> Anybody with some clever ideas? :)
>

try mount ceph with exact same options as the previous. The new mount
should share mds connection with previous one. then run 'umount -f'
the new mount.

> --
> Kerio Operator in de Cloud? https://www.kerioindecloud.nl/
> Mark Schouten  | Tuxis Internet Engineering
> KvK: 61527076  | http://www.tuxis.nl/
> T: 0318 200208 | i...@tuxis.nl
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

2018-01-10 Thread Jens-U. Mozdzen

Hi Alfredo,

thank you for your comments:

Zitat von Alfredo Deza :

On Wed, Jan 10, 2018 at 8:57 AM, Jens-U. Mozdzen  wrote:

Dear *,

has anybody been successful migrating Filestore OSDs to Bluestore OSDs,
keeping the OSD number? There have been a number of messages on the list,
reporting problems, and my experience is the same. (Removing the existing
OSD and creating a new one does work for me.)

I'm working on an Ceph 12.2.2 cluster and tried following
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd
- this basically says

1. destroy old OSD
2. zap the disk
3. prepare the new OSD
4. activate the new OSD

I never got step 4 to complete. The closest I got was by doing the following
steps (assuming OSD ID "999" on /dev/sdzz):

1. Stop the old OSD via systemd (osd-node # systemctl stop
ceph-osd@999.service)

2. umount the old OSD (osd-node # umount /var/lib/ceph/osd/ceph-999)

3a. if the old OSD was Bluestore with LVM, manually clean up the old OSD's
volume group

3b. zap the block device (osd-node # ceph-volume lvm zap /dev/sdzz)

4. destroy the old OSD (osd-node # ceph osd destroy 999
--yes-i-really-mean-it)

5. create a new OSD entry (osd-node # ceph osd new $(cat
/var/lib/ceph/osd/ceph-999/fsid) 999)


Step 5 and 6 are problematic if you are going to be trying ceph-volume
later on, which takes care of doing this for you.



6. add the OSD secret to Ceph authentication (osd-node # ceph auth add
osd.999 mgr 'allow profile osd' osd 'allow *' mon 'allow profile osd' -i
/var/lib/ceph/osd/ceph-999/keyring)


I at first tried to follow the documented steps (without my steps 5  
and 6), which did not work for me. The documented approach failed with  
"init authentication >> failed: (1) Operation not permitted", because  
actually ceph-volume did not add the auth entry for me.


But even after manually adding the authentication, the "ceph-volume"  
approach failed, as the OSD was still marked "destroyed" in the osdmap  
epoch as used by ceph-osd (see the commented messages from  
ceph-osd.999.log below).




7. prepare the new OSD (osd-node # ceph-volume lvm prepare --bluestore
--osd-id 999 --data /dev/sdzz)


You are going to hit a bug in ceph-volume that is preventing you from
specifying the osd id directly if the ID has been destroyed.

See http://tracker.ceph.com/issues/22642


If I read that bug description correctly, you're confirming why I  
needed step #6 above (manually adding the OSD auth entry. But even if  
ceph-volume had added it, the ceph-osd.log entries suggest that  
starting the OSD would still have failed, because of accessing the  
wrong osdmap epoch.


To me it seems like I'm hitting a bug outside of ceph-volume - unless  
it's ceph-volume that somehow determines which osdmap epoch is used by  
ceph-osd.



In order for this to work, you would need to make sure that the ID has
really been destroyed and avoid passing --osd-id in ceph-volume. The
caveat
being that you will get whatever ID is available next in the cluster.


Yes, that's the work-around I then used - purge the old OSD and create  
a new one.


Thanks & regards,
Jens


[...]
--- cut here ---
# first of multiple attempts, before "ceph auth add ..."
# no actual epoch referenced, as login failed due to missing auth
2018-01-10 00:00:02.173983 7f5cf1c89d00  0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for clients
2018-01-10 00:00:02.173990 7f5cf1c89d00  0 osd.999 0 crush map has features
288232575208783872 was 8705, adjusting msgr requires for mons
2018-01-10 00:00:02.173994 7f5cf1c89d00  0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for osds
2018-01-10 00:00:02.174046 7f5cf1c89d00  0 osd.999 0 load_pgs
2018-01-10 00:00:02.174051 7f5cf1c89d00  0 osd.999 0 load_pgs opened 0 pgs
2018-01-10 00:00:02.174055 7f5cf1c89d00  0 osd.999 0 using weightedpriority
op queue with priority op cut off at 64.
2018-01-10 00:00:02.174891 7f5cf1c89d00 -1 osd.999 0 log_to_monitors
{default=true}
2018-01-10 00:00:02.177479 7f5cf1c89d00 -1 osd.999 0 init authentication
failed: (1) Operation not permitted

# after "ceph auth ..."
# note the different epochs below? BTW, 110587 is the current epoch at that
time and osd.999 is marked destroyed there
# 109892: much too old to offer any details
# 110587: modified 2018-01-09 23:43:13.202381

2018-01-10 00:08:00.945507 7fc55905bd00  0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for clients
2018-01-10 00:08:00.945514 7fc55905bd00  0 osd.999 0 crush map has features
288232575208783872 was 8705, adjusting msgr requires for mons
2018-01-10 00:08:00.945521 7fc55905bd00  0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for osds
2018-01-10 00:08:00.945588 7fc55905bd00  0 osd.999 0 load_pgs
2018-01-10 00:08:00.945594 7fc55905bd00  0 osd.999 0 load_pgs opened 0 pgs
2018-01-10 00:08:00.945599 7fc55905bd00  0 osd.999 0 using 

Re: [ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

2018-01-10 Thread Alfredo Deza
On Wed, Jan 10, 2018 at 8:57 AM, Jens-U. Mozdzen  wrote:
> Dear *,
>
> has anybody been successful migrating Filestore OSDs to Bluestore OSDs,
> keeping the OSD number? There have been a number of messages on the list,
> reporting problems, and my experience is the same. (Removing the existing
> OSD and creating a new one does work for me.)
>
> I'm working on an Ceph 12.2.2 cluster and tried following
> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd
> - this basically says
>
> 1. destroy old OSD
> 2. zap the disk
> 3. prepare the new OSD
> 4. activate the new OSD
>
> I never got step 4 to complete. The closest I got was by doing the following
> steps (assuming OSD ID "999" on /dev/sdzz):
>
> 1. Stop the old OSD via systemd (osd-node # systemctl stop
> ceph-osd@999.service)
>
> 2. umount the old OSD (osd-node # umount /var/lib/ceph/osd/ceph-999)
>
> 3a. if the old OSD was Bluestore with LVM, manually clean up the old OSD's
> volume group
>
> 3b. zap the block device (osd-node # ceph-volume lvm zap /dev/sdzz)
>
> 4. destroy the old OSD (osd-node # ceph osd destroy 999
> --yes-i-really-mean-it)
>
> 5. create a new OSD entry (osd-node # ceph osd new $(cat
> /var/lib/ceph/osd/ceph-999/fsid) 999)

Step 5 and 6 are problematic if you are going to be trying ceph-volume
later on, which takes care of doing this for you.

>
> 6. add the OSD secret to Ceph authentication (osd-node # ceph auth add
> osd.999 mgr 'allow profile osd' osd 'allow *' mon 'allow profile osd' -i
> /var/lib/ceph/osd/ceph-999/keyring)
>
> 7. prepare the new OSD (osd-node # ceph-volume lvm prepare --bluestore
> --osd-id 999 --data /dev/sdzz)
> mon 'allow profile osd' -i /var/lib/ceph/osd/ceph-999/keyring)

You are going to hit a bug in ceph-volume that is preventing you from
specifying the osd id directly if the ID has been destroyed.

See http://tracker.ceph.com/issues/22642

In order for this to work, you would need to make sure that the ID has
really been destroyed and avoid passing --osd-id in ceph-volume. The
caveat
being that you will get whatever ID is available next in the cluster.

>
> but ceph-osd keeps complaining "osdmap says I am destroyed, exiting" on
> "osd-node # systemctl start ceph-osd@999.service".
>
> At first I felt I was hitting http://tracker.ceph.com/issues/21023
> (BlueStore-OSDs marked as destroyed in OSD-map after v12.1.1 to v12.1.4
> upgrade). But I was already using the "ceph osd new" command, which didn't
> help.
>
> Some hours of sleep later I matched the issued commands to the osdmap
> changes and the ceph-osd log messages, which revealed something strange:
>
> - from issuing "ceph osd destroy", osdmap lists the OSD as
> "autoout,destroyed,exists" (no surprise here)
> - once I issued "ceph osd new", osdmap lists the OSD as "autoout,exists,new"
> - starting ceph-osd after "ceph osd new" reports "osdmap says I am
> destroyed, exiting"
>
> I can see in the ceph-osd log that it is relating to an *old* osdmap epoch,
> roughly 45 minutes old by then?
>
> This got me curious and I dug through the OSD log file, checking the epoch
> numbers during start-up:
>
> I took some detours, so there's more than two failed starts in the OSD log
> file ;) :
>
> --- cut here ---
> # first of multiple attempts, before "ceph auth add ..."
> # no actual epoch referenced, as login failed due to missing auth
> 2018-01-10 00:00:02.173983 7f5cf1c89d00  0 osd.999 0 crush map has features
> 288232575208783872, adjusting msgr requires for clients
> 2018-01-10 00:00:02.173990 7f5cf1c89d00  0 osd.999 0 crush map has features
> 288232575208783872 was 8705, adjusting msgr requires for mons
> 2018-01-10 00:00:02.173994 7f5cf1c89d00  0 osd.999 0 crush map has features
> 288232575208783872, adjusting msgr requires for osds
> 2018-01-10 00:00:02.174046 7f5cf1c89d00  0 osd.999 0 load_pgs
> 2018-01-10 00:00:02.174051 7f5cf1c89d00  0 osd.999 0 load_pgs opened 0 pgs
> 2018-01-10 00:00:02.174055 7f5cf1c89d00  0 osd.999 0 using weightedpriority
> op queue with priority op cut off at 64.
> 2018-01-10 00:00:02.174891 7f5cf1c89d00 -1 osd.999 0 log_to_monitors
> {default=true}
> 2018-01-10 00:00:02.177479 7f5cf1c89d00 -1 osd.999 0 init authentication
> failed: (1) Operation not permitted
>
> # after "ceph auth ..."
> # note the different epochs below? BTW, 110587 is the current epoch at that
> time and osd.999 is marked destroyed there
> # 109892: much too old to offer any details
> # 110587: modified 2018-01-09 23:43:13.202381
>
> 2018-01-10 00:08:00.945507 7fc55905bd00  0 osd.999 0 crush map has features
> 288232575208783872, adjusting msgr requires for clients
> 2018-01-10 00:08:00.945514 7fc55905bd00  0 osd.999 0 crush map has features
> 288232575208783872 was 8705, adjusting msgr requires for mons
> 2018-01-10 00:08:00.945521 7fc55905bd00  0 osd.999 0 crush map has features
> 288232575208783872, adjusting msgr requires for osds
> 2018-01-10 00:08:00.945588 7fc55905bd00  0 osd.999 0 load_pgs
> 2018-01-10 

[ceph-users] luminous: HEALTH_ERR full ratio(s) out of order

2018-01-10 Thread Stefan Priebe - Profihost AG
Hello,

since upgrading to luminous i get the following error:

HEALTH_ERR full ratio(s) out of order
OSD_OUT_OF_ORDER_FULL full ratio(s) out of order
backfillfull_ratio (0.9) < nearfull_ratio (0.95), increased

but ceph.conf has:

mon_osd_full_ratio = .97
mon_osd_nearfull_ratio = .95
mon_osd_backfillfull_ratio = .96
osd_backfill_full_ratio = .96
osd_failsafe_full_ratio = .98

Any ideas?  i already restarted:
* all osds
* all mons
* all mgrs

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

2018-01-10 Thread Jens-U. Mozdzen

Dear *,

has anybody been successful migrating Filestore OSDs to Bluestore  
OSDs, keeping the OSD number? There have been a number of messages on  
the list, reporting problems, and my experience is the same. (Removing  
the existing OSD and creating a new one does work for me.)


I'm working on an Ceph 12.2.2 cluster and tried following  
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd - this basically  
says


1. destroy old OSD
2. zap the disk
3. prepare the new OSD
4. activate the new OSD

I never got step 4 to complete. The closest I got was by doing the  
following steps (assuming OSD ID "999" on /dev/sdzz):


1. Stop the old OSD via systemd (osd-node # systemctl stop  
ceph-osd@999.service)


2. umount the old OSD (osd-node # umount /var/lib/ceph/osd/ceph-999)

3a. if the old OSD was Bluestore with LVM, manually clean up the old  
OSD's volume group


3b. zap the block device (osd-node # ceph-volume lvm zap /dev/sdzz)

4. destroy the old OSD (osd-node # ceph osd destroy 999  
--yes-i-really-mean-it)


5. create a new OSD entry (osd-node # ceph osd new $(cat  
/var/lib/ceph/osd/ceph-999/fsid) 999)


6. add the OSD secret to Ceph authentication (osd-node # ceph auth add  
osd.999 mgr 'allow profile osd' osd 'allow *' mon 'allow profile osd'  
-i /var/lib/ceph/osd/ceph-999/keyring)


7. prepare the new OSD (osd-node # ceph-volume lvm prepare --bluestore  
--osd-id 999 --data /dev/sdzz)

mon 'allow profile osd' -i /var/lib/ceph/osd/ceph-999/keyring)

but ceph-osd keeps complaining "osdmap says I am destroyed, exiting"  
on "osd-node # systemctl start ceph-osd@999.service".


At first I felt I was hitting http://tracker.ceph.com/issues/21023  
(BlueStore-OSDs marked as destroyed in OSD-map after v12.1.1 to  
v12.1.4 upgrade). But I was already using the "ceph osd new" command,  
which didn't help.


Some hours of sleep later I matched the issued commands to the osdmap  
changes and the ceph-osd log messages, which revealed something strange:


- from issuing "ceph osd destroy", osdmap lists the OSD as  
"autoout,destroyed,exists" (no surprise here)

- once I issued "ceph osd new", osdmap lists the OSD as "autoout,exists,new"
- starting ceph-osd after "ceph osd new" reports "osdmap says I am  
destroyed, exiting"


I can see in the ceph-osd log that it is relating to an *old* osdmap  
epoch, roughly 45 minutes old by then?


This got me curious and I dug through the OSD log file, checking the  
epoch numbers during start-up:


I took some detours, so there's more than two failed starts in the OSD  
log file ;) :


--- cut here ---
# first of multiple attempts, before "ceph auth add ..."
# no actual epoch referenced, as login failed due to missing auth
2018-01-10 00:00:02.173983 7f5cf1c89d00  0 osd.999 0 crush map has  
features 288232575208783872, adjusting msgr requires for clients
2018-01-10 00:00:02.173990 7f5cf1c89d00  0 osd.999 0 crush map has  
features 288232575208783872 was 8705, adjusting msgr requires for mons
2018-01-10 00:00:02.173994 7f5cf1c89d00  0 osd.999 0 crush map has  
features 288232575208783872, adjusting msgr requires for osds

2018-01-10 00:00:02.174046 7f5cf1c89d00  0 osd.999 0 load_pgs
2018-01-10 00:00:02.174051 7f5cf1c89d00  0 osd.999 0 load_pgs opened 0 pgs
2018-01-10 00:00:02.174055 7f5cf1c89d00  0 osd.999 0 using  
weightedpriority op queue with priority op cut off at 64.
2018-01-10 00:00:02.174891 7f5cf1c89d00 -1 osd.999 0 log_to_monitors  
{default=true}
2018-01-10 00:00:02.177479 7f5cf1c89d00 -1 osd.999 0 init  
authentication failed: (1) Operation not permitted


# after "ceph auth ..."
# note the different epochs below? BTW, 110587 is the current epoch at  
that time and osd.999 is marked destroyed there

# 109892: much too old to offer any details
# 110587: modified 2018-01-09 23:43:13.202381

2018-01-10 00:08:00.945507 7fc55905bd00  0 osd.999 0 crush map has  
features 288232575208783872, adjusting msgr requires for clients
2018-01-10 00:08:00.945514 7fc55905bd00  0 osd.999 0 crush map has  
features 288232575208783872 was 8705, adjusting msgr requires for mons
2018-01-10 00:08:00.945521 7fc55905bd00  0 osd.999 0 crush map has  
features 288232575208783872, adjusting msgr requires for osds

2018-01-10 00:08:00.945588 7fc55905bd00  0 osd.999 0 load_pgs
2018-01-10 00:08:00.945594 7fc55905bd00  0 osd.999 0 load_pgs opened 0 pgs
2018-01-10 00:08:00.945599 7fc55905bd00  0 osd.999 0 using  
weightedpriority op queue with priority op cut off at 64.
2018-01-10 00:08:00.946544 7fc55905bd00 -1 osd.999 0 log_to_monitors  
{default=true}
2018-01-10 00:08:00.951720 7fc55905bd00  0 osd.999 0 done with init,  
starting boot process
2018-01-10 00:08:00.952225 7fc54160a700 -1 osd.999 0 waiting for  
initial osdmap
2018-01-10 00:08:00.970644 7fc546614700  0 osd.999 109892 crush map  
has features 288232610642264064, adjusting msgr requires for clients
2018-01-10 00:08:00.970653 7fc546614700  0 osd.999 109892 crush map  
has features 

Re: [ceph-users] rbd: map failed

2018-01-10 Thread Lenz Grimmer
On 01/09/2018 07:46 PM, Karun Josy wrote:

> We have a user "testuser" with below permissions :
> 
> $ ceph auth get client.testuser
> exported keyring for client.testuser
> [client.testuser]
>         key = ==
>         caps mon = "profile rbd"
>         caps osd = "profile rbd pool=ecpool, profile rbd pool=cv,
> profile rbd-read-only pool=templates"
> 
> 
> But when we try to map an image in pool 'templates' we get the below
> error : 
> --
> # rbd map templates/centos.7-4.x86-64.2017 --id testuser
> rbd: sysfs write failed
> In some cases useful info is found in syslog - try "dmesg | tail".
> rbd: map failed: (1) Operation not permitted
> 
> 
> Is it because that user has only read permission in templates pool ?

Did you check "dmesg" as outlined in the error message? Anything in
there that might give a hint?

Lenz

-- 
SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 'lost' cephfs filesystem?

2018-01-10 Thread Mark Schouten
On woensdag 10 januari 2018 08:42:04 CET Webert de Souza Lima wrote:
> try to kick out (evict) that cephfs client from the mds node, see
> http://docs.ceph.com/docs/master/cephfs/eviction/


Thanks, that's a good suggestion. Just one question, will this affect RBD-
access from the same (client)host?

-- 
Kerio Operator in de Cloud? https://www.kerioindecloud.nl/
Mark Schouten  | Tuxis Internet Engineering
KvK: 61527076  | http://www.tuxis.nl/
T: 0318 200208 | i...@tuxis.nl

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-volume lvm deactivate/destroy/zap

2018-01-10 Thread Alfredo Deza
On Wed, Jan 10, 2018 at 2:10 AM, Fabian Grünbichler
 wrote:
> On Tue, Jan 09, 2018 at 02:14:51PM -0500, Alfredo Deza wrote:
>> On Tue, Jan 9, 2018 at 1:35 PM, Reed Dier  wrote:
>> > I would just like to mirror what Dan van der Ster’s sentiments are.
>> >
>> > As someone attempting to move an OSD to bluestore, with limited/no LVM
>> > experience, it is a completely different beast and complexity level 
>> > compared
>> > to the ceph-disk/filestore days.
>> >
>> > ceph-deploy was a very simple tool that did exactly what I was looking to
>> > do, but now we have deprecated ceph-disk halfway into a release, 
>> > ceph-deploy
>> > doesn’t appear to fully support ceph-volume, which is now the official way
>> > to manage OSDs moving forward.
>>
>> ceph-deploy now fully supports ceph-volume, we should get a release soon
>>
>> >
>> > My ceph-volume create statement ‘succeeded’ but the OSD doesn’t start, so
>> > now I am trying to zap the disk to try to recreate the OSD, and the zap is
>> > failing as Dan’s did.
>>
>> I would encourage you to open a ticket in the tracker so that we can
>> improve on what failed for you
>>
>> http://tracker.ceph.com/projects/ceph-volume/issues/new
>>
>> ceph-volume keeps thorough logs in /var/log/ceph/ceph-volume.log and
>> /var/log/ceph/ceph-volume-systemd.log
>>
>> If you create a ticket, please make sure to add all the output and
>> steps that you can
>> >
>> > And yes, I was able to get it zapped using the lvremove, vgremove, pvremove
>> > commands, but that is not obvious to someone who hasn’t used LVM 
>> > extensively
>> > for storage management before.
>> >
>> > I also want to mirror Dan’s sentiments about the unnecessary complexity
>> > imposed on what I expect is the default use case of an entire disk being
>> > used. I can’t see anything more than the ‘entire disk’ method being the
>> > largest use case for users of ceph, especially the smaller clusters trying
>> > to maximize hardware/spend.
>>
>> We don't take lightly the introduction of LVM here. The new tool is
>> addressing several insurmountable issues with how ceph-disk operated.
>>
>> Although using an entire disk might be easier in the use case you are
>> in, it is certainly not the only thing we have to support, so then
>> again, we can't
>> reliably decide what strategy would be best to destroy that volume, or
>> group, or if the PV should be destroyed as well.
>
> wouldn't it be possible to detect on creation that it is a full physical
> disk that gets initialized completely by ceph-volume, store that in the
> metadata somewhere and clean up accordingly when destroying the OSD?

When the OSD is created, we capture a lot of metadata about devices,
what goes were (even if the device changes names), and
what devices are part of an OSD. For example we can accurately tell if
a device is a Journal and what OSD is it associated with.

The removal of an LV and its corresponding VG is very destructive with
no way to revert, and even though we allow a simplistic approach of
creating the
VG and LV for you it doesn't necessarily mean that an operator will
want to have a VG fully destroyed when zapping an LV.

There are two use cases here:

1) An operator is redeploying and wants to completely remove the VG
(including the PV and LV), that may or may not have been created by
ceph-volume
2) An operator already has VGs and LVs in place and wants to reuse
them for an OSD - no need to destroy the underlying VG

We must support #2, but I see that there is a lot of users that would
like a more transparent removal of LVM-related devices like what
ceph-volume does when creating.

How about a flag that allows that behavior (although not enabled by
default) so that `zap` can destroy the LVM devices as well? So instead
of:

ceph-volume lvm zap vg/lv

We would offer:

ceph-volume lvm zap --destroy vg/lv

Which would get rid of the lv, vg, and pv as well


>
>>
>> The 'zap' sub-command will allow that lv to be reused for an OSD and
>> that should work. Again, if it isn't sufficient, we really do need
>> more information and a
>> ticket in the tracker is the best way.
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 'lost' cephfs filesystem?

2018-01-10 Thread Webert de Souza Lima
try to kick out (evict) that cephfs client from the mds node, see
http://docs.ceph.com/docs/master/cephfs/eviction/


Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*

On Wed, Jan 10, 2018 at 12:59 AM, Mark Schouten  wrote:

> Hi,
>
> While upgrading a server with a CephFS mount tonight, it stalled on
> installing
> a new kernel, because it was waiting for `sync`.
>
> I'm pretty sure it has something to do with the CephFS filesystem which
> caused
> some issues last week. I think the kernel still has a reference to the
> probably lazy unmounted CephFS filesystem.
> Unmounting the filesystem 'works', which means it is no longer available,
> but
> the unmount-command seems to be waiting for sync() as well. Mounting the
> filesystem again doesn't work either.
>
> I know the simple solution is to just reboot the server, but the server
> holds
> quite a lot of VM's and Containers, so I'd prefer to fix this without a
> reboot.
>
> Anybody with some clever ideas? :)
>
> --
> Kerio Operator in de Cloud? https://www.kerioindecloud.nl/
> Mark Schouten  | Tuxis Internet Engineering
> KvK: 61527076  | http://www.tuxis.nl/
> T: 0318 200208 | i...@tuxis.nl
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSDs going down/up at random

2018-01-10 Thread Mike O'Connor
On 10/01/2018 4:48 PM, Mike O'Connor wrote:
> On 10/01/2018 4:24 PM, Sam Huracan wrote:
>> Hi Mike,
>>
>> Could you show system log at moment osd down and up?
So now I know its a crash, what my next step. As soon as I put the
system under write load, OSDs start crashing.

Mike
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to "reset" rgw?

2018-01-10 Thread Martin Emrich

Hi!

As I cannot find any solution for my broken rgw pools, the only way out 
is to give up and "reset".


How do I throw away all rgw data from a ceph cluster? Just delete all 
rgw pools? Or are some parts stored elsewhere (monitor, ...)?


Thanks,

Martin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS cache size limits

2018-01-10 Thread stefan
Quoting John Spray (jsp...@redhat.com):
> On Mon, Jan 8, 2018 at 8:02 PM, Marc Roos  wrote:
> >
> > I guess the mds cache holds files, attributes etc but how many files
> > will the default "mds_cache_memory_limit": "1073741824" hold?
> 
> We always used to get asked how much memory a given mds_cache_size (in
> inodes) would require, I guess it was only a matter of time until the
> reverse question was asked :-)

@Marc Roos: 

{"items":731544591,"bytes":144955142984}

Bytes used is actually: 235604586496 (RES), 248868077568 (VIRT). 

So:

1073741824/235604586496
.00455738931049302623
.*731544591
933.49917239288177762193
./(1000^2)
3.33393349917239288177

~ around 3 Milion "items" for 1 GiB

More details below.

ceph daemon mds.mds2 perf dump | jq

  },
  "mds_mem": {
"ino": 45358896,
"ino+": 268519704,
"ino-": 223160808,
"dir": 31856591,
"dir+": 199761898,
"dir-": 167905307,
"dn": 45358896,
"dn+": 281905135,
"dn-": 236546239,
"cap": 376,
"cap+": 300470635,
"cap-": 297470559,
"rss": 230082340,
"heap": 313884,
"buf": 0
},

ceph daemon mds.mds2 dump_mempools | jq

  },
  "mds_co": {
"items": 731544591,
"bytes": 144955142984
  },
...
...
  },
  "total": {
"items": 1482069569,
"bytes": 145738337213
  }

Gr. Stefan

-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Incomplete pgs and no data movement ( cluster appears readonly )

2018-01-10 Thread Janne Johansson
2018-01-10 8:51 GMT+01:00 Brent Kennedy :

> As per a previous thread, my pgs are set too high.  I tried adjusting the
> “mon max pg per osd” up higher and higher, which did clear the
> error(restarted monitors and managers each time), but it seems that data
> simply wont move around the cluster.  If I stop the primary OSD of an
> incomplete pg, the cluster just shows those affected pages as
> active+undersized+degraded:
>
>
> I also adjusted “osd max pg per osd hard ratio ” to 5, but that didn’t
> seem to trigger any data moved.  I did restart the OSDs each time I changed
> it.  The data just wont finish moving.  “ceph –w” shows this:
>
> 2018-01-10 07:49:27.715163 osd.20 [WRN] slow request 960.675164 seconds
> old, received at 2018-01-10 07:33:27.039907: osd_op(client.3542508.0:4097
> 14.0 14.50e8d0b0 (undecoded) ondisk+write+known_if_redirected e125984)
> currently queued_for_pg
>
>
>
Did you bump the ratio so that the PGs per OSD max * hard ratio actually
became more than the amount of PGs you had?
Last time you mailed the ratio was 25xx and the max was 200 which meant the
ratio would have needed to be far more than 5.0.


-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com