Re: [ceph-users] Signature check failures.

2018-02-19 Thread Cary
Gregory,


I greatly appreciate your assistance. I recompiled Ceph with -ssl and
the nss USE flags set, which is opposite what I was using. I am now
able to export from our pools without signature check failures. Thank
you for pointing me in the right direction.

Cary
-Dynamic



On Fri, Feb 16, 2018 at 11:29 PM, Gregory Farnum <gfar...@redhat.com> wrote:
> On Thu, Feb 15, 2018 at 10:28 AM Cary <dynamic.c...@gmail.com> wrote:
>>
>> Hello,
>>
>> I have enabled debugging on my MONs and OSDs to help troubleshoot
>> these signature check failures. I was watching ods.4's log and saw
>> these errors when the signature check failure happened.
>>
>> 2018-02-15 18:06:29.235791 7f8bca7de700  1 --
>> 192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021
>> conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).read_bulk peer
>> close file descriptor 81
>> 2018-02-15 18:06:29.235832 7f8bca7de700  1 --
>> 192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021
>> conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).read_until read
>> failed
>> 2018-02-15 18:06:29.235841 7f8bca7de700  1 --
>> 192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021
>> conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).process read
>> tag failed
>> 2018-02-15 18:06:29.235848 7f8bca7de700  1 --
>> 192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021
>> conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).fault on lossy
>> channel, failing
>> 2018-02-15 18:06:29.235966 7f8bc0853700  2 osd.8 27498 ms_handle_reset
>> con 0x55f802746000 session 0x55f8063b3180
>>
>>
>>  Could someone please look at this? We have 3 different Ceph clusters
>> setup and they all have this issue. This cluster is running Gentoo and
>> Ceph version 12.2.2-r1. The other two clusters are 12.2.2. Exporting
>> images causes signature check failures and with larger files it seg
>> faults as well.
>>
>> When exporting the image from osd.4 This message shows up as well.
>> Exporting image: 1% complete...2018-02-15 18:14:05.283708 7f6834277700
>>  0 -- 192.168.173.44:0/122241099 >> 192.168.173.44:6801/72152
>> conn(0x7f681400ff10 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH
>> pgs=0 cs=0 l=1).handle_connect_reply connect got BADAUTHORIZER
>>
>> The error below show up on all OSD/MGR/MON nodes when exporting an image.
>> Exporting image: 8% complete...2018-02-15 18:15:51.419437 7f2b64ac0700
>>  0 SIGN: MSG 28 Message signature does not match contents.
>> 2018-02-15 18:15:51.419459 7f2b64ac0700  0 SIGN: MSG 28Signature on
>> message:
>> 2018-02-15 18:15:51.419460 7f2b64ac0700  0 SIGN: MSG 28sig:
>> 8338581684421737157
>> 2018-02-15 18:15:51.419469 7f2b64ac0700  0 SIGN: MSG 28Locally
>> calculated signature:
>> 2018-02-15 18:15:51.419470 7f2b64ac0700  0 SIGN: MSG 28
>> sig_check:5913182128308244
>> 2018-02-15 18:15:51.419471 7f2b64ac0700  0 Signature failed.
>> 2018-02-15 18:15:51.419472 7f2b64ac0700  0 --
>> 192.168.173.44:0/3919097436 >> 192.168.173.44:6801/72152
>> conn(0x7f2b4800ff10 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH
>> pgs=39 cs=1 l=1).process Signature check failed
>>
>> Our VMs crash when writing to disk. Libvirt's logs just say the VM
>> crashed.   This is a blocker. Has anyone else seen this? This seems to
>> be an issue with Ceph Luminous, as we were not having these problem
>> with Jewel.
>
>
> When I search through my email, the only two reports of failed signatures
> are people who in fact had misconfiguration issues resulting in one end
> using signatures and the other side not.
>
> Given that, and since you're on Gentoo and presumably compiled the packages
> yourself, the most likely explanation I can think of is something that went
> wrong between your packages and the compilation. :/
>
> I guess you could try switching from libnss to libcryptopp (or vice versa)
> by recompiling with the relevant makeflags if you want to do something that
> only involves the Ceph code. Otherwise, do a rebuild?
>
> Sadly I don't think there's much else we can suggest given that nobody has
> seen this with binary packages blessed by the upstream or a distribution.
> -Greg
>
>>
>>
>> Cary
>> -Dynamic
>>
>> On Thu, Feb 1, 2018 at 7:04 PM, Cary <dynamic.c...@gmail.com> wrote:
>> > Hello,
>> >
>> > I did not do anything special that I know of. I was just exporting an
>> > image from Openstack. We have recently upgraded from Jewel 10.2.3 to
>> > Luminous 12.2.2.
>> >
>> > C

Re: [ceph-users] Signature check failures.

2018-02-15 Thread Cary
Hello,

I have enabled debugging on my MONs and OSDs to help troubleshoot
these signature check failures. I was watching ods.4's log and saw
these errors when the signature check failure happened.

2018-02-15 18:06:29.235791 7f8bca7de700  1 --
192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021
conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).read_bulk peer
close file descriptor 81
2018-02-15 18:06:29.235832 7f8bca7de700  1 --
192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021
conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).read_until read
failed
2018-02-15 18:06:29.235841 7f8bca7de700  1 --
192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021
conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).process read
tag failed
2018-02-15 18:06:29.235848 7f8bca7de700  1 --
192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021
conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).fault on lossy
channel, failing
2018-02-15 18:06:29.235966 7f8bc0853700  2 osd.8 27498 ms_handle_reset
con 0x55f802746000 session 0x55f8063b3180


 Could someone please look at this? We have 3 different Ceph clusters
setup and they all have this issue. This cluster is running Gentoo and
Ceph version 12.2.2-r1. The other two clusters are 12.2.2. Exporting
images causes signature check failures and with larger files it seg
faults as well.

When exporting the image from osd.4 This message shows up as well.
Exporting image: 1% complete...2018-02-15 18:14:05.283708 7f6834277700
 0 -- 192.168.173.44:0/122241099 >> 192.168.173.44:6801/72152
conn(0x7f681400ff10 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH
pgs=0 cs=0 l=1).handle_connect_reply connect got BADAUTHORIZER

The error below show up on all OSD/MGR/MON nodes when exporting an image.
Exporting image: 8% complete...2018-02-15 18:15:51.419437 7f2b64ac0700
 0 SIGN: MSG 28 Message signature does not match contents.
2018-02-15 18:15:51.419459 7f2b64ac0700  0 SIGN: MSG 28Signature on message:
2018-02-15 18:15:51.419460 7f2b64ac0700  0 SIGN: MSG 28sig:
8338581684421737157
2018-02-15 18:15:51.419469 7f2b64ac0700  0 SIGN: MSG 28Locally
calculated signature:
2018-02-15 18:15:51.419470 7f2b64ac0700  0 SIGN: MSG 28
sig_check:5913182128308244
2018-02-15 18:15:51.419471 7f2b64ac0700  0 Signature failed.
2018-02-15 18:15:51.419472 7f2b64ac0700  0 --
192.168.173.44:0/3919097436 >> 192.168.173.44:6801/72152
conn(0x7f2b4800ff10 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH
pgs=39 cs=1 l=1).process Signature check failed

Our VMs crash when writing to disk. Libvirt's logs just say the VM
crashed.   This is a blocker. Has anyone else seen this? This seems to
be an issue with Ceph Luminous, as we were not having these problem
with Jewel.

Cary
-Dynamic

On Thu, Feb 1, 2018 at 7:04 PM, Cary <dynamic.c...@gmail.com> wrote:
> Hello,
>
> I did not do anything special that I know of. I was just exporting an
> image from Openstack. We have recently upgraded from Jewel 10.2.3 to
> Luminous 12.2.2.
>
> Caps for admin:
> client.admin
> key: CENSORED
> auid: 0
> caps: [mgr] allow *
> caps: [mon] allow *
> caps: [osd] allow *
>
> Caps for Cinder:
> client.cinder
> key: CENSORED
> caps: [mgr] allow r
> caps: [mon] profile rbd, allow command "osd blacklist"
> caps: [osd] profile rbd pool=vms, profile rbd pool=volumes,
> profile rbd pool=images
>
> Caps for MGR:
> mgr.0
> key: CENSORED
> caps: [mon] allow *
>
> I believe this is causing the virtual machines we have running to
> crash. Any advice would be appreciated. Please let me know if I need
> to provide any other details. Thank you,
>
> Cary
> -Dynamic
>
> On Mon, Jan 29, 2018 at 7:53 PM, Gregory Farnum <gfar...@redhat.com> wrote:
>> On Fri, Jan 26, 2018 at 12:14 PM Cary <dynamic.c...@gmail.com> wrote:
>>>
>>> Hello,
>>>
>>>  We are running Luminous 12.2.2. 6 OSD hosts with 12 1TB OSDs, and 64GB
>>> RAM. Each host has a SSD for Bluestore's block.wal and block.db.
>>> There are 5 monitor nodes as well with 32GB RAM. All servers have
>>> Gentoo with kernel, 4.12.12-gentoo.
>>>
>>> When I export an image using:
>>> rbd export pool-name/volume-name  /location/image-name.raw
>>>
>>> Message similar to below are displayed. The signature check fails
>>> randomly. And sometimes a message about a bad authorizer, but not
>>> everytime.
>>> The image is still exported successfully.
>>>
>>> 2018-01-24 17:35:15.616080 7fc8d4024700  0 cephx:
>>> verify_authorizer_reply bad nonce got 4552544084014661633 expected
>>> 4552499520046621785 sent 4552499520046621784
>>> 2018-01-24 17:35:15.616098 7fc8d4024700  0

Re: [ceph-users] Signature check failures.

2018-02-01 Thread Cary
Hello,

I did not do anything special that I know of. I was just exporting an
image from Openstack. We have recently upgraded from Jewel 10.2.3 to
Luminous 12.2.2.

Caps for admin:
client.admin
key: CENSORED
auid: 0
caps: [mgr] allow *
caps: [mon] allow *
caps: [osd] allow *

Caps for Cinder:
client.cinder
key: CENSORED
caps: [mgr] allow r
caps: [mon] profile rbd, allow command "osd blacklist"
caps: [osd] profile rbd pool=vms, profile rbd pool=volumes,
profile rbd pool=images

Caps for MGR:
mgr.0
key: CENSORED
caps: [mon] allow *

I believe this is causing the virtual machines we have running to
crash. Any advice would be appreciated. Please let me know if I need
to provide any other details. Thank you,

Cary
-Dynamic

On Mon, Jan 29, 2018 at 7:53 PM, Gregory Farnum <gfar...@redhat.com> wrote:
> On Fri, Jan 26, 2018 at 12:14 PM Cary <dynamic.c...@gmail.com> wrote:
>>
>> Hello,
>>
>>  We are running Luminous 12.2.2. 6 OSD hosts with 12 1TB OSDs, and 64GB
>> RAM. Each host has a SSD for Bluestore's block.wal and block.db.
>> There are 5 monitor nodes as well with 32GB RAM. All servers have
>> Gentoo with kernel, 4.12.12-gentoo.
>>
>> When I export an image using:
>> rbd export pool-name/volume-name  /location/image-name.raw
>>
>> Message similar to below are displayed. The signature check fails
>> randomly. And sometimes a message about a bad authorizer, but not
>> everytime.
>> The image is still exported successfully.
>>
>> 2018-01-24 17:35:15.616080 7fc8d4024700  0 cephx:
>> verify_authorizer_reply bad nonce got 4552544084014661633 expected
>> 4552499520046621785 sent 4552499520046621784
>> 2018-01-24 17:35:15.616098 7fc8d4024700  0 --
>> 172.21.32.16:0/1412094654 >> 172.21.32.6:6802/6219 conn(0x7fc8b0078a50
>> :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0
>> l=1)._process_connection failed verifying authorize reply
>> 2018-01-24 17:35:15.699004 7fc8d4024700  0 SIGN: MSG 2 Message
>> signature does not match contents.
>> 2018-01-24 17:35:15.699020 7fc8d4024700  0 SIGN: MSG 2Signature on
>> message:
>> 2018-01-24 17:35:15.699021 7fc8d4024700  0 SIGN: MSG 2sig:
>> 8189090775647585001
>> 2018-01-24 17:35:15.699047 7fc8d4024700  0 SIGN: MSG 2Locally
>> calculated signature:
>> 2018-01-24 17:35:15.699048 7fc8d4024700  0 SIGN: MSG 2
>> sig_check:140500325643792
>> 2018-01-24 17:35:15.699049 7fc8d4024700  0 Signature failed.
>> 2018-01-24 17:35:15.699050 7fc8d4024700  0 --
>> 172.21.32.16:0/1412094654 >> 172.21.32.2:6807/153106
>> conn(0x7fc8bc020870 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH
>> pgs=26018 cs=1 l=1).process Signature check failed
>>
>> Does anyone know what could cause this, and what I can do to fix it.
>
>
> That's in the cephx authentication code and it's indicating that the secure
> signature sent with the message isn't what the local node thinks it should
> be. That's pretty odd (a bit flip or something that could actually change it
> ought to trigger the messaging checksums directly) and I'm not quite sure
> how it could happen.
>
> But, as you've noticed, it retries and apparently succeeds. How did you
> notice this?
> -Greg
>
>>
>>
>> Thank you,
>>
>> Cary
>> -Dynamic
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Signature check failures.

2018-01-26 Thread Cary
Hello,

 We are running Luminous 12.2.2. 6 OSD hosts with 12 1TB OSDs, and 64GB
RAM. Each host has a SSD for Bluestore's block.wal and block.db.
There are 5 monitor nodes as well with 32GB RAM. All servers have
Gentoo with kernel, 4.12.12-gentoo.

When I export an image using:
rbd export pool-name/volume-name  /location/image-name.raw

Message similar to below are displayed. The signature check fails
randomly. And sometimes a message about a bad authorizer, but not
everytime.
The image is still exported successfully.

2018-01-24 17:35:15.616080 7fc8d4024700  0 cephx:
verify_authorizer_reply bad nonce got 4552544084014661633 expected
4552499520046621785 sent 4552499520046621784
2018-01-24 17:35:15.616098 7fc8d4024700  0 --
172.21.32.16:0/1412094654 >> 172.21.32.6:6802/6219 conn(0x7fc8b0078a50
:-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0
l=1)._process_connection failed verifying authorize reply
2018-01-24 17:35:15.699004 7fc8d4024700  0 SIGN: MSG 2 Message
signature does not match contents.
2018-01-24 17:35:15.699020 7fc8d4024700  0 SIGN: MSG 2Signature on message:
2018-01-24 17:35:15.699021 7fc8d4024700  0 SIGN: MSG 2sig:
8189090775647585001
2018-01-24 17:35:15.699047 7fc8d4024700  0 SIGN: MSG 2Locally
calculated signature:
2018-01-24 17:35:15.699048 7fc8d4024700  0 SIGN: MSG 2
sig_check:140500325643792
2018-01-24 17:35:15.699049 7fc8d4024700  0 Signature failed.
2018-01-24 17:35:15.699050 7fc8d4024700  0 --
172.21.32.16:0/1412094654 >> 172.21.32.2:6807/153106
conn(0x7fc8bc020870 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH
pgs=26018 cs=1 l=1).process Signature check failed

Does anyone know what could cause this, and what I can do to fix it.

Thank you,

Cary
-Dynamic
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Signature check failures.

2018-01-25 Thread Cary
Hello,

 We are running Luminous 12.2.2. 6 OSD hosts with 12 1TB OSDs, and 64GB
RAM. Each host has a SSD for Bluestore's block.wal and block.db.
There are 5 monitor nodes as well with 32GB RAM. All servers have
Gentoo with kernel, 4.12.12-gentoo.

When I export an image using:
rbd export pool-name/volume-name  /location/image-name.raw

Message similar to below are displayed. The signature check fails
randomly. And sometimes a message about a bad authorizer, but not
everytime.
The image is still exported successfully.

2018-01-24 17:35:15.616080 7fc8d4024700  0 cephx:
verify_authorizer_reply bad nonce got 4552544084014661633 expected
4552499520046621785 sent 4552499520046621784
2018-01-24 17:35:15.616098 7fc8d4024700  0 --
172.21.32.16:0/1412094654 >> 172.21.32.6:6802/6219 conn(0x7fc8b0078a50
:-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0
l=1)._process_connection failed verifying authorize reply
2018-01-24 17:35:15.699004 7fc8d4024700  0 SIGN: MSG 2 Message
signature does not match contents.
2018-01-24 17:35:15.699020 7fc8d4024700  0 SIGN: MSG 2Signature on message:
2018-01-24 17:35:15.699021 7fc8d4024700  0 SIGN: MSG 2sig:
8189090775647585001
2018-01-24 17:35:15.699047 7fc8d4024700  0 SIGN: MSG 2Locally
calculated signature:
2018-01-24 17:35:15.699048 7fc8d4024700  0 SIGN: MSG 2
sig_check:140500325643792
2018-01-24 17:35:15.699049 7fc8d4024700  0 Signature failed.
2018-01-24 17:35:15.699050 7fc8d4024700  0 --
172.21.32.16:0/1412094654 >> 172.21.32.2:6807/153106
conn(0x7fc8bc020870 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH
pgs=26018 cs=1 l=1).process Signature check failed

Does anyone know what could cause this, and what I can do to fix it.

Thank you,

Cary
-Dynamic
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Signature check failures.

2018-01-24 Thread Cary
Hello,

 We are running Luminous 12.2.2. 6 OSD hosts with 12 1TB, and 64GB
RAM. Each host with a SSD for Bluestore's block.wal and block.db.
There are 5 monitor nodes as well with 32GB RAM. All servers have
Gentoo with kernel, 4.12.12-gentoo.

When I export an image using:
rbd export pool-name/volume-name  /location/image-name.raw

The following messages show up. The signature check fails randomly.
The image is still exported successfully.

2018-01-24 17:35:15.616080 7fc8d4024700  0 cephx:
verify_authorizer_reply bad nonce got 4552544084014661633 expected
4552499520046621785 sent 4552499520046621784
2018-01-24 17:35:15.616098 7fc8d4024700  0 --
172.21.32.16:0/1412094654 >> 172.21.32.6:6802/6219 conn(0x7fc8b0078a50
:-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0
l=1)._process_connection failed verifying authorize reply
2018-01-24 17:35:15.699004 7fc8d4024700  0 SIGN: MSG 2 Message
signature does not match contents.
2018-01-24 17:35:15.699020 7fc8d4024700  0 SIGN: MSG 2Signature on message:
2018-01-24 17:35:15.699021 7fc8d4024700  0 SIGN: MSG 2sig:
8189090775647585001
2018-01-24 17:35:15.699047 7fc8d4024700  0 SIGN: MSG 2Locally
calculated signature:
2018-01-24 17:35:15.699048 7fc8d4024700  0 SIGN: MSG 2
sig_check:140500325643792
2018-01-24 17:35:15.699049 7fc8d4024700  0 Signature failed.
2018-01-24 17:35:15.699050 7fc8d4024700  0 --
172.21.32.16:0/1412094654 >> 172.21.32.2:6807/153106
conn(0x7fc8bc020870 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH
pgs=26018 cs=1 l=1).process Signature check failed

Does anyone know what could cause this, and what I can do to fix it.

Thank you,

Cary
-Dynamic
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-volume does not support upstart

2017-12-29 Thread Cary
Hello,

I mount my Bluestore OSDs in /etc/fstab:

vi /etc/fstab

tmpfs   /var/lib/ceph/osd/ceph-12  tmpfs   rw,relatime 0 0
=
Then mount everyting in fstab with:
mount -a
==
I activate my OSDs this way on startup: You can find the fsid with

cat /var/lib/ceph/osd/ceph-12/fsid

Then add file named ceph.start so ceph-volume will be run at startup.

vi /etc/local.d/ceph.start
ceph-volume lvm activate 12 827f4a2c-8c1b-427b-bd6c-66d31a0468ac
==
Make it excitable:
chmod 700 /etc/local.d/ceph.start
==
cd /etc/local.d/
./ceph.start
==
I am a Gentoo user and use OpenRC, so this may not apply to you.
==
cd /etc/init.d/
ln -s ceph ceph-osd.12
/etc/init.d/ceph-osd.12 start
rc-update add ceph-osd.12 default

Cary

On Fri, Dec 29, 2017 at 8:47 AM, 赵赵贺东 <zhaohed...@gmail.com> wrote:
> Hello Cary!
> It’s really big surprise for me to receive your reply!
> Sincere thanks to you!
> I know it’s a fake execute file, but it works!
>
> >
> $ cat /usr/sbin/systemctl
> #!/bin/bash
> exit 0
> <
>
> I can start my osd by following command
> /usr/bin/ceph-osd --cluster=ceph -i 12 -f --setuser ceph --setgroup ceph
>
> But, threre are still problems.
> 1.Though ceph-osd can start successfully, prepare log and activate log looks
> like errors occurred.
>
> Prepare log:
> ===>
> # ceph-volume lvm prepare --bluestore --data vggroup/lv
> Running command: sudo mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-12
> Running command: chown -R ceph:ceph /dev/dm-0
> Running command: sudo ln -s /dev/vggroup/lv /var/lib/ceph/osd/ceph-12/block
> Running command: sudo ceph --cluster ceph --name client.bootstrap-osd
> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o
> /var/lib/ceph/osd/ceph-12/activate.monmap
>  stderr: got monmap epoch 1
> Running command: ceph-authtool /var/lib/ceph/osd/ceph-12/keyring
> --create-keyring --name osd.12 --add-key
> AQAQ+UVa4z2ANRAAmmuAExQauFinuJuL6A56ww==
>  stdout: creating /var/lib/ceph/osd/ceph-12/keyring
>  stdout: added entity osd.12 auth auth(auid = 18446744073709551615
> key=AQAQ+UVa4z2ANRAAmmuAExQauFinuJuL6A56ww== with 0 caps)
> Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-12/keyring
> Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-12/
> Running command: sudo ceph-osd --cluster ceph --osd-objectstore bluestore
> --mkfs -i 12 --monmap /var/lib/ceph/osd/ceph-12/activate.monmap --key
>  --osd-data
> /var/lib/ceph/osd/ceph-12/ --osd-uuid 827f4a2c-8c1b-427b-bd6c-66d31a0468ac
> --setuser ceph --setgroup ceph
>  stderr: warning: unable to create /var/run/ceph: (13) Permission denied
>  stderr: 2017-12-29 08:13:08.609127 b66f3000 -1 asok(0x850c62a0)
> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to
> bind the UNIX domain socket to '/var/run/ceph/ceph-osd.12.asok': (2) No such
> file or directory
>  stderr:
>  stderr: 2017-12-29 08:13:08.643410 b66f3000 -1
> bluestore(/var/lib/ceph/osd/ceph-12//block) _read_bdev_label unable to
> decode label at offset 66: buffer::malformed_input: void
> bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past
> end of struct encoding
>  stderr: 2017-12-29 08:13:08.644055 b66f3000 -1
> bluestore(/var/lib/ceph/osd/ceph-12//block) _read_bdev_label unable to
> decode label at offset 66: buffer::malformed_input: void
> bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past
> end of struct encoding
>  stderr: 2017-12-29 08:13:08.644722 b66f3000 -1
> bluestore(/var/lib/ceph/osd/ceph-12//block) _read_bdev_label unable to
> decode label at offset 66: buffer::malformed_input: void
> bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past
> end of struct encoding
>  stderr: 2017-12-29 08:13:08.646722 b66f3000 -1
> bluestore(/var/lib/ceph/osd/ceph-12/) _read_fsid unparsable uuid
>  stderr: 2017-12-29 08:14:00.697028 b66f3000 -1 key
> AQAQ+UVa4z2ANRAAmmuAExQauFinuJuL6A56ww==
>  stderr: 2017-12-29 08:14:01.261659 b66f3000 -1 created object store
> /var/lib/ceph/osd/ceph-12/ for osd.12 fsid
> 4e5adad0-784c-41b4-ab72-5f4fae499b3a
> <===
>
> Activate log:
> ===>
> # ceph-volume lvm activate --bluestore 12
> 827f4a2c-8c1b-427b-bd6c-66d31a0468ac
> Running command: sudo ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev
> /d

Re: [ceph-users] ceph-volume does not support upstart

2017-12-28 Thread Cary

You could add a file named  /usr/sbin/systemctl and add:
exit 0
to it.
 
Cary

On Dec 28, 2017, at 18:45, 赵赵贺东 <zhaohed...@gmail.com> wrote:


Hello ceph-users!

I am a ceph user from china.
Our company deploy ceph on arm ubuntu 14.04. 
Ceph Version is luminous 12.2.2.
When I try to activate osd by ceph-volume, I got the following error.(osd 
prepare stage seems work normally)
It seems that ceph-volume only work under systemd, but ubuntu 14.04 does not 
support systemd.
How can I deploy osd in ubuntu 14.04 by ceph-volume?
Will ceph-volume support upstart in the future?

===>
# ceph-volume lvm activate --bluestore 12 03fa2757-412d-4892-af8a-f2260294a2dc
Running command: sudo ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev 
/dev/vggroup/lvdata --path /var/lib/ceph/osd/ceph-12
Running command: sudo ln -snf /dev/vggroup/lvdata 
/var/lib/ceph/osd/ceph-12/block
Running command: chown -R ceph:ceph /dev/dm-2
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-12
Running command: sudo systemctl enable 
ceph-volume@lvm-12-03fa2757-412d-4892-af8a-f2260294a2dc
 stderr: sudo: systemctl: command not found
-->  RuntimeError: command returned non-zero exit status: 1
<


Your reply will be appreciated!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 答复: Can't delete file in cephfs with "No space left on device"

2017-12-25 Thread Cary
Are you using hardlinks in cephfs?


On Tue, Dec 26, 2017 at 3:42 AM, 周 威 <cho...@msn.cn> wrote:
> The out put of ceph osd df
>
>
>
> ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE  VAR  PGS
>
> 0 1.62650  1.0  1665G  1279G   386G 76.82 1.05 343
>
> 1 1.62650  1.0  1665G  1148G   516G 68.97 0.94 336
>
> 2 1.62650  1.0  1665G  1253G   411G 75.27 1.03 325
>
> 3 1.62650  1.0  1665G  1192G   472G 71.60 0.98 325
>
> 4 1.62650  1.0  1665G  1205G   460G 72.35 0.99 341
>
> 5 1.62650  1.0  1665G  1381G   283G 82.95 1.13 364
>
> 6 1.62650  1.0  1665G  1069G   595G 64.22 0.88 322
>
> 7 1.62650  1.0  1665G  1222G   443G 73.38 1.00 337
>
> 8 1.62650  1.0  1665G  1120G   544G 67.29 0.92 312
>
> 9 1.62650  1.0  1665G  1166G   498G 70.04 0.96 336
>
> 10 1.62650  1.0  1665G  1254G   411G 75.31 1.03 348
>
> 11 1.62650  1.0  1665G  1352G   313G 81.19 1.11 341
>
> 12 1.62650  1.0  1665G  1174G   490G 70.52 0.96 328
>
> 13 1.62650  1.0  1665G  1281G   383G 76.95 1.05 345
>
> 14 1.62650  1.0  1665G  1147G   518G 68.88 0.94 339
>
> 15 1.62650  1.0  1665G  1236G   429G 74.24 1.01 334
>
> 20 1.62650  1.0  1665G  1166G   499G 70.03 0.96 325
>
> 21 1.62650  1.0  1665G  1371G   293G 82.35 1.13 377
>
> 22 1.62650  1.0  1665G  1110G   555G 66.67 0.91 341
>
> 23 1.62650  1.0  1665G  1221G   443G 73.36 1.00 327
>
> 16 1.62650  1.0  1665G  1354G   310G 81.34 1.11 352
>
> 17 1.62650  1.0  1665G  1250G   415G 75.06 1.03 341
>
> 18 1.62650  1.0  1665G  1179G   486G 70.80 0.97 316
>
> 19 1.62650  1.0  1665G  1236G   428G 74.26 1.01 333
>
> 24 1.62650  1.0  1665G  1146G   518G 68.86 0.94 325
>
> 25 1.62650  1.0  1665G  1033G   632G 62.02 0.85 309
>
> 26 1.62650  1.0  1665G  1234G   431G 74.11 1.01 334
>
> 27 1.62650  1.0  1665G  1342G   322G 80.62 1.10 352
>
>   TOTAL 46635G 34135G 12500G 73.20
>
> MIN/MAX VAR: 0.85/1.13  STDDEV: 5.28
>
>
>
> 发件人: Cary [mailto:dynamic.c...@gmail.com]
> 发送时间: 2017年12月26日 11:40
> 收件人: ? ? <cho...@msn.cn>
> 抄送: ceph-users@lists.ceph.com
> 主题: Re: [ceph-users] Can't delete file in cephfs with "No space left on
> device"
>
>
>
> Could you post the output of “ceph osd df”?
>
>
> On Dec 25, 2017, at 19:46, ? ? <cho...@msn.cn> wrote:
>
> Hi all:
>
>
>
> Ceph version:
> ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0)
>
>
>
> Ceph df:
>
> GLOBAL:
>
> SIZE   AVAIL  RAW USED %RAW USED
>
> 46635G 12500G   34135G 73.19
>
>
>
> rm d
>
> rm: cannot remove `d': No space left on device
>
>
>
> and mds_cache:
>
> {
>
> "mds_cache": {
>
> "num_strays": 999713,
>
> "num_strays_purging": 0,
>
> "num_strays_delayed": 0,
>
> "num_purge_ops": 0,
>
> "strays_created": 999723,
>
> "strays_purged": 10,
>
> "strays_reintegrated": 0,
>
> "strays_migrated": 0,
>
> "num_recovering_processing": 0,
>
> "num_recovering_enqueued": 0,
>
> "num_recovering_prioritized": 0,
>
> "recovery_started": 107,
>
> "recovery_completed": 107
>
> }
>
> }
>
>
>
> It seems starys num are stuck, what should I do?
>
> Thanks all.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 答复: Can't delete file in cephfs with "No space left on device"

2017-12-25 Thread Cary
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-October/013646.html

On Tue, Dec 26, 2017 at 6:07 AM, Cary <dynamic.c...@gmail.com> wrote:
> Are you using hardlinks in cephfs?
>
>
> On Tue, Dec 26, 2017 at 3:42 AM, 周 威 <cho...@msn.cn> wrote:
>> The out put of ceph osd df
>>
>>
>>
>> ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE  VAR  PGS
>>
>> 0 1.62650  1.0  1665G  1279G   386G 76.82 1.05 343
>>
>> 1 1.62650  1.0  1665G  1148G   516G 68.97 0.94 336
>>
>> 2 1.62650  1.0  1665G  1253G   411G 75.27 1.03 325
>>
>> 3 1.62650  1.0  1665G  1192G   472G 71.60 0.98 325
>>
>> 4 1.62650  1.0  1665G  1205G   460G 72.35 0.99 341
>>
>> 5 1.62650  1.0  1665G  1381G   283G 82.95 1.13 364
>>
>> 6 1.62650  1.0  1665G  1069G   595G 64.22 0.88 322
>>
>> 7 1.62650  1.0  1665G  1222G   443G 73.38 1.00 337
>>
>> 8 1.62650  1.0  1665G  1120G   544G 67.29 0.92 312
>>
>> 9 1.62650  1.0  1665G  1166G   498G 70.04 0.96 336
>>
>> 10 1.62650  1.0  1665G  1254G   411G 75.31 1.03 348
>>
>> 11 1.62650  1.0  1665G  1352G   313G 81.19 1.11 341
>>
>> 12 1.62650  1.0  1665G  1174G   490G 70.52 0.96 328
>>
>> 13 1.62650  1.0  1665G  1281G   383G 76.95 1.05 345
>>
>> 14 1.62650  1.0  1665G  1147G   518G 68.88 0.94 339
>>
>> 15 1.62650  1.0  1665G  1236G   429G 74.24 1.01 334
>>
>> 20 1.62650  1.0  1665G  1166G   499G 70.03 0.96 325
>>
>> 21 1.62650  1.0  1665G  1371G   293G 82.35 1.13 377
>>
>> 22 1.62650  1.0  1665G  1110G   555G 66.67 0.91 341
>>
>> 23 1.62650  1.0  1665G  1221G   443G 73.36 1.00 327
>>
>> 16 1.62650  1.0  1665G  1354G   310G 81.34 1.11 352
>>
>> 17 1.62650  1.0  1665G  1250G   415G 75.06 1.03 341
>>
>> 18 1.62650  1.0  1665G  1179G   486G 70.80 0.97 316
>>
>> 19 1.62650  1.0  1665G  1236G   428G 74.26 1.01 333
>>
>> 24 1.62650  1.0  1665G  1146G   518G 68.86 0.94 325
>>
>> 25 1.62650  1.0  1665G  1033G   632G 62.02 0.85 309
>>
>> 26 1.62650  1.0  1665G  1234G   431G 74.11 1.01 334
>>
>> 27 1.62650  1.0  1665G  1342G   322G 80.62 1.10 352
>>
>>   TOTAL 46635G 34135G 12500G 73.20
>>
>> MIN/MAX VAR: 0.85/1.13  STDDEV: 5.28
>>
>>
>>
>> 发件人: Cary [mailto:dynamic.c...@gmail.com]
>> 发送时间: 2017年12月26日 11:40
>> 收件人: ? ? <cho...@msn.cn>
>> 抄送: ceph-users@lists.ceph.com
>> 主题: Re: [ceph-users] Can't delete file in cephfs with "No space left on
>> device"
>>
>>
>>
>> Could you post the output of “ceph osd df”?
>>
>>
>> On Dec 25, 2017, at 19:46, ? ? <cho...@msn.cn> wrote:
>>
>> Hi all:
>>
>>
>>
>> Ceph version:
>> ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0)
>>
>>
>>
>> Ceph df:
>>
>> GLOBAL:
>>
>> SIZE   AVAIL  RAW USED %RAW USED
>>
>> 46635G 12500G   34135G 73.19
>>
>>
>>
>> rm d
>>
>> rm: cannot remove `d': No space left on device
>>
>>
>>
>> and mds_cache:
>>
>> {
>>
>> "mds_cache": {
>>
>> "num_strays": 999713,
>>
>> "num_strays_purging": 0,
>>
>> "num_strays_delayed": 0,
>>
>> "num_purge_ops": 0,
>>
>> "strays_created": 999723,
>>
>> "strays_purged": 10,
>>
>> "strays_reintegrated": 0,
>>
>> "strays_migrated": 0,
>>
>> "num_recovering_processing": 0,
>>
>> "num_recovering_enqueued": 0,
>>
>> "num_recovering_prioritized": 0,
>>
>> "recovery_started": 107,
>>
>> "recovery_completed": 107
>>
>> }
>>
>> }
>>
>>
>>
>> It seems starys num are stuck, what should I do?
>>
>> Thanks all.
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't delete file in cephfs with "No space left on device"

2017-12-25 Thread Cary
Could you post the output of “ceph osd df”?

On Dec 25, 2017, at 19:46, ? ?  wrote:

Hi all:
 
Ceph version:
ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0)
 
Ceph df:
GLOBAL:
SIZE   AVAIL  RAW USED %RAW USED
46635G 12500G   34135G 73.19
 
rm d
rm: cannot remove `d': No space left on device
 
and mds_cache:
{
"mds_cache": {
"num_strays": 999713,
"num_strays_purging": 0,
"num_strays_delayed": 0,
"num_purge_ops": 0,
"strays_created": 999723,
"strays_purged": 10,
"strays_reintegrated": 0,
"strays_migrated": 0,
"num_recovering_processing": 0,
"num_recovering_enqueued": 0,
"num_recovering_prioritized": 0,
"recovery_started": 107,
"recovery_completed": 107
}
}
 
It seems starys num are stuck, what should I do?
Thanks all.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] POOL_NEARFULL

2017-12-19 Thread Cary
Karun,

 You can check how much data each OSD has with "ceph osd df"

ID CLASS WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE  VAR  PGS
 1   hdd   1.84000  1.0 1885G   769G  1115G  40.84   0.97  101
 3   hdd   4.64000  1.0 4679G   2613G 2065G 55.86 1.33 275
 4   hdd   4.64000  1.0 4674G   1914G 2759G 40.96 0.97 193
 5   hdd   4.64000  1.0 4668G   1434G 3234G 30.72 0.73 148
 8   hdd   1.84000  1.0 1874G   742G  1131G 39.61 0.94  74
 0   hdd   4.64000  1.0 4668G   2331G 2337G 49.94 1.19 268
 2   hdd   1.84000  1.0 4668G   868G  3800G 18.60 0.44  99
 6   hdd   4.64000  1.0 4668G   2580G 2087G 55.28 1.32 275
 7   hdd   1.84000  1.01874G888G   985G 47.43 1.13 107
TOTAL 33661G 14144G 19516G 42.02
MIN/MAX VAR: 0.44/1.33  STDDEV: 11.27

 The "%USE" column shows how much space is used on each OSD. You may
need to change the weight of some of the OSDs so the data balances out
correctly with "ceph osd crush reweight osd.N W".Change the N to the
number of OSD and W to the new weight.

 As you can see from above even though the weight on my 4.6TB is the
same for all of them, they have different %USE. So I could lower the
weight of the OSDs with more data, and Ceph will balance the cluster.

 I am not too sure why this happens.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008623.html

Cary
-Dynamic

On Tue, Dec 19, 2017 at 3:57 PM, Jean-Charles Lopez <jelo...@redhat.com> wrote:
> Hi
>
> did you set quotas on these pools?
>
> See this page for explanation of most error messages:
> http://docs.ceph.com/docs/master/rados/operations/health-checks/#pool-near-full
>
> JC
>
> On Dec 19, 2017, at 01:48, Karun Josy <karunjo...@gmail.com> wrote:
>
> Hello,
>
> In one of our clusters, health is showing these warnings :
> -
> OSD_NEARFULL 1 nearfull osd(s)
> osd.22 is near full
> POOL_NEARFULL 3 pool(s) nearfull
> pool 'templates' is nearfull
> pool 'cvm' is nearfull
> pool 'ecpool' is nearfull
> 
>
> One osd is above 85% used, which I know caused the OSD_Nearfull flag.
> But what does pool(s) nearfull mean ?
> And how can I correct it ?
>
> ]$ ceph df
> GLOBAL:
> SIZE   AVAIL  RAW USED %RAW USED
> 31742G 11147G   20594G 64.88
> POOLS:
> NAMEID USED   %USED MAX AVAIL OBJECTS
> templates  5196G 23.28  645G   50202
> cvm   66528 0 1076G 770
> ecpool   7  10260G 83.56 2018G 3004031
>
>
>
> Karun
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)

2017-12-18 Thread Cary
James,

If your replication factor is 3, for every 1GB added, your GB avail
with decrease by 3GB.


Cary
-Dynamic

On Mon, Dec 18, 2017 at 6:18 PM, James Okken <james.ok...@dialogic.com> wrote:
> Thanks David.
> Thanks again Cary.
>
> If I have
> 682 GB used, 12998 GB / 13680 GB avail,
> then I still need to divide 13680/3 (my replication setting) to get what my 
> total storage really is, right?
>
> Thanks!
>
>
> James Okken
> Lab Manager
> Dialogic Research Inc.
> 4 Gatehall Drive
> Parsippany
> NJ 07054
> USA
>
> Tel:   973 967 5179
> Email:   james.ok...@dialogic.com
> Web:www.dialogic.com – The Network Fuel Company
>
> This e-mail is intended only for the named recipient(s) and may contain 
> information that is privileged, confidential and/or exempt from disclosure 
> under applicable law. No waiver of privilege, confidence or otherwise is 
> intended by virtue of communication via the internet. Any unauthorized use, 
> dissemination or copying is strictly prohibited. If you have received this 
> e-mail in error, or are not named as a recipient, please immediately notify 
> the sender and destroy all copies of this e-mail.
>
>
> -Original Message-
> From: Cary [mailto:dynamic.c...@gmail.com]
> Sent: Friday, December 15, 2017 5:56 PM
> To: David Turner
> Cc: James Okken; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)
>
> James,
>
> You can set these values in ceph.conf.
>
> [global]
> ...
> osd pool default size = 3
> osd pool default min size  = 2
> ...
>
> New pools that are created will use those values.
>
> If you run a "ceph -s"  and look at the "usage" line, it shows how much space 
> is: 1 used, 2 available, 3 total. ie.
>
> usage:   19465 GB used, 60113 GB / 79578 GB avail
>
> We choose to use Openstack with Ceph in this decade and do the other things, 
> not because they are easy, but because they are hard...;-p
>
>
> Cary
> -Dynamic
>
> On Fri, Dec 15, 2017 at 10:12 PM, David Turner <drakonst...@gmail.com> wrote:
>> In conjunction with increasing the pool size to 3, also increase the
>> pool min_size to 2.  `ceph df` and `ceph osd df` will eventually show
>> the full size in use in your cluster.  In particular the output of
>> `ceph df` with available size in a pool takes into account the pools 
>> replication size.
>> Continue watching ceph -s or ceph -w to see when the backfilling for
>> your change to replication size finishes.
>>
>> On Fri, Dec 15, 2017 at 5:06 PM James Okken <james.ok...@dialogic.com>
>> wrote:
>>>
>>> This whole effort went extremely well, thanks to Cary, and Im not
>>> used to that with CEPH so far. (And openstack ever) Thank you
>>> Cary.
>>>
>>> Ive upped the replication factor and now I see "replicated size 3" in
>>> each of my pools. Is this the only place to check replication level?
>>> Is there a Global setting or only a setting per Pool?
>>>
>>> ceph osd pool ls detail
>>> pool 0 'rbd' replicated size 3..
>>> pool 1 'images' replicated size 3...
>>> ...
>>>
>>> One last question!
>>> At this replication level how can I tell how much total space I
>>> actually have now?
>>> Do I just 1/3 the Global size?
>>>
>>> ceph df
>>> GLOBAL:
>>> SIZE   AVAIL  RAW USED %RAW USED
>>> 13680G 12998G 682G  4.99
>>> POOLS:
>>> NAMEID USED %USED MAX AVAIL OBJECTS
>>> rbd 0 0 0 6448G   0
>>> images  1  216G  3.24 6448G   27745
>>> backups 2 0 0 6448G   0
>>> volumes 3  117G  1.79 6448G   30441
>>> compute 4 0 0 6448G   0
>>>
>>> ceph osd df
>>> ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE VAR  PGS
>>>  0 0.81689  1.0   836G 36549M   800G 4.27 0.86  67
>>>  4 3.7  1.0  3723G   170G  3553G 4.58 0.92 270
>>>  1 0.81689  1.0   836G 49612M   788G 5.79 1.16  56
>>>  5 3.7  1.0  3723G   192G  3531G 5.17 1.04 282
>>>  2 0.81689  1.0   836G 33639M   803G 3.93 0.79  58
>>>  3 3.7  1.0  3723G   202G  3521G 5.43 1.09 291
>>>   TOTAL 13680G   682G 12998G 4.99
>>> MIN/MAX VAR: 0.79/1.16  STDDEV: 0.67
>>>
>>> Thanks!
>>>
>

Re: [ceph-users] Migrating to new pools (RBD, CephFS)

2017-12-18 Thread Cary
A possible option. They do not recommend using cppool.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-July/011460.html

**COMPLETELY UNTESTED AND DANGEROUS**

stop all MDS daemons
delete your filesystem (but leave the pools)
use "rados export" and "rados import" to do a full copy of the
metadata to a new pool (*not* cppool, it doesn't copy OMAP data)
use "ceph fs new" to create a new filesystem that uses your new metadata pool
use "ceph fs reset" to skip the creating phase of the new filesystem
start MDS daemons

**COMPLETELY UNTESTED AND DANGEROUS**


On Mon, Dec 18, 2017 at 1:18 PM, Jens-U. Mozdzen  wrote:
> Hi *,
>
> facing the problem to reduce the number of PGs for a pool, I've found
> various information and suggestions, but no "definite guide" to handle pool
> migration with Ceph 12.2.x. This seems to be a fairly common problem when
> having to deal with "teen-age clusters", so consolidated information would
> be a real help. I'm willing to start writing things up, but don't want to
> duplicate information. So:
>
> Are there any documented "operational procedures" on how to migrate
>
> - an RBD pool (with snapshots created by Openstack)
>
> - a CephFS data pool
>
> - a CephFS metadata pool
>
> to a different volume, in order to be able to utilize pool settings that
> cannot be changed on an existing pool?
>
> ---
>
> RBD pools: From what I've read, RBD snapshots are "broken" after using
> "rados cppool" to move the content of an "RBD pool" to a new pool.
>
> ---
>
> CephFS data pool: I know I can add additional pools to a CephFS instance
> ("ceph fs add_data_pool"), and have newly created files to be placed in the
> new pool ("file layouts"). But according to the docs, a small amount of
> metadata is kept in the primary data pool for all files, so I cannot remove
> the original pool.
>
> I couldn't identify how CephFS (MDS) identifies it's current data pool (or
> "default data pool" in case of multiple pools - the one named in "ceph fs
> new"), so "rados cppool"-moving the data to a new pool and then
> reconfiguring CephFS to use the new pool (while MDS are stopped, of course)
> is not yet an option? And there might be references to the pool id hiding in
> CephFS metadata, too, invalidating this approach altogether.
>
> Of course, dumping the current content of the CephFS to external storage and
> recreating the CephFS instance with new pools is a potential option, but may
> required a substantial amount of extra storage ;)
>
> ---
>
> CephFS metadata pool: I've not seen any indication of a procedure to swap
> metadata pools.
>
>
> I couldn't identify how CephFS (MDS) identifies it's current metadata pool,
> so "rados cppool"-moving the metadata to a new pool and then reconfiguring
> CephFS to use the new pool (while MDS are stopped, of course) is not yet an
> option?
>
> Of course, dumping the current content of the CephFS to external storage and
> recreating the CephFS instance with new pools is a potential option, but may
> required a substantial amount of extra storage ;)
>
> ---
>
> http://cephnotes.ksperis.com/blog/2015/04/15/ceph-pool-migration describes
> an interesting approach to migrate all pool contents by making the current
> pool a cache tier to the new pool and then migrate the "cache tier content"
> to the (new) base pool. But I'm not yet able to judge the approach and will
> have to conduct tests. Can anyone already make an educated guess if
> especially the "snapshot" problem for RBD pools will be circumvented this
> way and how CephFS will react to this approach? This "cache tier" approach,
> if feasible, would be a nice way to circumvent downtime and extra space
> requirements.
>
> Thank you for any ideas, insight and experience you can share!
>
> Regards,
> J
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG active+clean+remapped status

2017-12-16 Thread Cary
Karun,

 Could you paste in the output from "ceph health detail"? Which OSD
was just added?

Cary
-Dynamic

On Sun, Dec 17, 2017 at 4:59 AM, Karun Josy <karunjo...@gmail.com> wrote:
> Any help would be appreciated!
>
> Karun Josy
>
> On Sat, Dec 16, 2017 at 11:04 PM, Karun Josy <karunjo...@gmail.com> wrote:
>>
>> Hi,
>>
>> Repair didnt fix the issue.
>>
>> In the pg dump details, I notice this None. Seems pg is missing from one
>> of the OSD
>>
>> [0,2,NONE,4,12,10,5,1]
>> [0,2,1,4,12,10,5,1]
>>
>> There is no way Ceph corrects this automatically ? I have to edit/
>> troubleshoot it manually ?
>>
>> Karun
>>
>> On Sat, Dec 16, 2017 at 10:44 PM, Cary <dynamic.c...@gmail.com> wrote:
>>>
>>> Karun,
>>>
>>>  Running ceph pg repair should not cause any problems. It may not fix
>>> the issue though. If that does not help, there is more information at
>>> the link below.
>>> http://ceph.com/geen-categorie/ceph-manually-repair-object/
>>>
>>> I recommend not rebooting, or restarting while Ceph is repairing or
>>> recovering. If possible, wait until the cluster is in a healthy state
>>> first.
>>>
>>> Cary
>>> -Dynamic
>>>
>>> On Sat, Dec 16, 2017 at 2:05 PM, Karun Josy <karunjo...@gmail.com> wrote:
>>> > Hi Cary,
>>> >
>>> > No, I didnt try to repair it.
>>> > I am comparatively new in ceph. Is it okay to try to repair it ?
>>> > Or should I take any precautions while doing it ?
>>> >
>>> > Karun Josy
>>> >
>>> > On Sat, Dec 16, 2017 at 2:08 PM, Cary <dynamic.c...@gmail.com> wrote:
>>> >>
>>> >> Karun,
>>> >>
>>> >>  Did you attempt a "ceph pg repair "? Replace  with the pg
>>> >> ID that needs repaired, 3.4.
>>> >>
>>> >> Cary
>>> >> -D123
>>> >>
>>> >> On Sat, Dec 16, 2017 at 8:24 AM, Karun Josy <karunjo...@gmail.com>
>>> >> wrote:
>>> >> > Hello,
>>> >> >
>>> >> > I added 1 disk to the cluster and after rebalancing, it shows 1 PG
>>> >> > is in
>>> >> > remapped state. How can I correct it ?
>>> >> >
>>> >> > (I had to restart some osds during the rebalancing as there were
>>> >> > some
>>> >> > slow
>>> >> > requests)
>>> >> >
>>> >> > $ ceph pg dump | grep remapped
>>> >> > dumped all
>>> >> > 3.4 981  00 0   0
>>> >> > 2655009792
>>> >> > 1535 1535 active+clean+remapped 2017-12-15 22:07:21.663964
>>> >> > 2824'785115
>>> >> > 2824:2297888 [0,2,NONE,4,12,10,5,1]  0   [0,2,1,4,12,10,5,1]
>>> >> > 0  2288'767367 2017-12-14 11:00:15.576741  417'518549 2017-12-08
>>> >> > 03:56:14.006982
>>> >> >
>>> >> > That PG belongs to an erasure pool with k=5, m =3 profile, failure
>>> >> > domain is
>>> >> > host.
>>> >> >
>>> >> > ===
>>> >> >
>>> >> > $ ceph osd tree
>>> >> > ID  CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
>>> >> >  -1   16.94565 root default
>>> >> >  -32.73788 host ceph-a1
>>> >> >   0   ssd  1.86469 osd.0up  1.0 1.0
>>> >> >  14   ssd  0.87320 osd.14   up  1.0 1.0
>>> >> >  -52.73788 host ceph-a2
>>> >> >   1   ssd  1.86469 osd.1up  1.0 1.0
>>> >> >  15   ssd  0.87320 osd.15   up  1.0 1.0
>>> >> >  -71.86469 host ceph-a3
>>> >> >   2   ssd  1.86469 osd.2up  1.0 1.0
>>> >> >  -91.74640 host ceph-a4
>>> >> >   3   ssd  0.87320 osd.3up  1.0 1.0
>>> >> >   4   ssd  0.87320 osd.4up  1.0 1.0
>>> >> > -111.74640 host ceph-a5
>>> >> >   5   ssd  0.87320 osd.5up  1.0 1.0
>>> >> >   6   ssd  0.87320 osd.6up  1.0 1.0
>>> >> > -131.74640 host ceph-a6
>>> >> >   7   ssd  0.87320 osd.7up  1.0 1.0
>>> >> >   8   ssd  0.87320 osd.8up  1.0 1.0
>>> >> > -151.74640 host ceph-a7
>>> >> >   9   ssd  0.87320 osd.9up  1.0 1.0
>>> >> >  10   ssd  0.87320 osd.10   up  1.0 1.0
>>> >> > -172.61960 host ceph-a8
>>> >> >  11   ssd  0.87320 osd.11   up  1.0 1.0
>>> >> >  12   ssd  0.87320 osd.12   up  1.0 1.0
>>> >> >  13   ssd  0.87320 osd.13   up  1.0 1.0
>>> >> >
>>> >> >
>>> >> >
>>> >> > Karun
>>> >> >
>>> >> > ___
>>> >> > ceph-users mailing list
>>> >> > ceph-users@lists.ceph.com
>>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >> >
>>> >
>>> >
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG active+clean+remapped status

2017-12-16 Thread Cary
Karun,

 Running ceph pg repair should not cause any problems. It may not fix
the issue though. If that does not help, there is more information at
the link below.
http://ceph.com/geen-categorie/ceph-manually-repair-object/

I recommend not rebooting, or restarting while Ceph is repairing or
recovering. If possible, wait until the cluster is in a healthy state
first.

Cary
-Dynamic

On Sat, Dec 16, 2017 at 2:05 PM, Karun Josy <karunjo...@gmail.com> wrote:
> Hi Cary,
>
> No, I didnt try to repair it.
> I am comparatively new in ceph. Is it okay to try to repair it ?
> Or should I take any precautions while doing it ?
>
> Karun Josy
>
> On Sat, Dec 16, 2017 at 2:08 PM, Cary <dynamic.c...@gmail.com> wrote:
>>
>> Karun,
>>
>>  Did you attempt a "ceph pg repair "? Replace  with the pg
>> ID that needs repaired, 3.4.
>>
>> Cary
>> -D123
>>
>> On Sat, Dec 16, 2017 at 8:24 AM, Karun Josy <karunjo...@gmail.com> wrote:
>> > Hello,
>> >
>> > I added 1 disk to the cluster and after rebalancing, it shows 1 PG is in
>> > remapped state. How can I correct it ?
>> >
>> > (I had to restart some osds during the rebalancing as there were some
>> > slow
>> > requests)
>> >
>> > $ ceph pg dump | grep remapped
>> > dumped all
>> > 3.4 981  00 0   0 2655009792
>> > 1535 1535 active+clean+remapped 2017-12-15 22:07:21.663964
>> > 2824'785115
>> > 2824:2297888 [0,2,NONE,4,12,10,5,1]  0   [0,2,1,4,12,10,5,1]
>> > 0  2288'767367 2017-12-14 11:00:15.576741  417'518549 2017-12-08
>> > 03:56:14.006982
>> >
>> > That PG belongs to an erasure pool with k=5, m =3 profile, failure
>> > domain is
>> > host.
>> >
>> > ===
>> >
>> > $ ceph osd tree
>> > ID  CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
>> >  -1   16.94565 root default
>> >  -32.73788 host ceph-a1
>> >   0   ssd  1.86469 osd.0up  1.0 1.0
>> >  14   ssd  0.87320 osd.14   up  1.0 1.0
>> >  -52.73788 host ceph-a2
>> >   1   ssd  1.86469 osd.1up  1.0 1.0
>> >  15   ssd  0.87320 osd.15   up  1.0 1.0
>> >  -71.86469 host ceph-a3
>> >   2   ssd  1.86469 osd.2up  1.0 1.0
>> >  -91.74640 host ceph-a4
>> >   3   ssd  0.87320 osd.3up  1.0 1.0
>> >   4   ssd  0.87320 osd.4up  1.0 1.0
>> > -111.74640 host ceph-a5
>> >   5   ssd  0.87320 osd.5up  1.0 1.0
>> >   6   ssd  0.87320 osd.6up  1.0 1.0
>> > -131.74640 host ceph-a6
>> >   7   ssd  0.87320 osd.7up  1.0 1.0
>> >   8   ssd  0.87320 osd.8up  1.0 1.0
>> > -151.74640 host ceph-a7
>> >   9   ssd  0.87320 osd.9up  1.0 1.0
>> >  10   ssd  0.87320 osd.10   up  1.0 1.0
>> > -172.61960 host ceph-a8
>> >  11   ssd  0.87320 osd.11   up  1.0 1.0
>> >  12   ssd  0.87320 osd.12   up  1.0 1.0
>> >  13   ssd  0.87320 osd.13   up  1.0 1.0
>> >
>> >
>> >
>> > Karun
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG active+clean+remapped status

2017-12-16 Thread Cary
Karun,

 Did you attempt a "ceph pg repair "? Replace  with the pg
ID that needs repaired, 3.4.

Cary
-D123

On Sat, Dec 16, 2017 at 8:24 AM, Karun Josy <karunjo...@gmail.com> wrote:
> Hello,
>
> I added 1 disk to the cluster and after rebalancing, it shows 1 PG is in
> remapped state. How can I correct it ?
>
> (I had to restart some osds during the rebalancing as there were some slow
> requests)
>
> $ ceph pg dump | grep remapped
> dumped all
> 3.4 981  00 0   0 2655009792
> 1535 1535 active+clean+remapped 2017-12-15 22:07:21.663964  2824'785115
> 2824:2297888 [0,2,NONE,4,12,10,5,1]  0   [0,2,1,4,12,10,5,1]
> 0  2288'767367 2017-12-14 11:00:15.576741  417'518549 2017-12-08
> 03:56:14.006982
>
> That PG belongs to an erasure pool with k=5, m =3 profile, failure domain is
> host.
>
> ===
>
> $ ceph osd tree
> ID  CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
>  -1   16.94565 root default
>  -32.73788 host ceph-a1
>   0   ssd  1.86469 osd.0up  1.0 1.0
>  14   ssd  0.87320 osd.14   up  1.0 1.0
>  -52.73788 host ceph-a2
>   1   ssd  1.86469 osd.1up  1.0 1.0
>  15   ssd  0.87320 osd.15   up  1.0 1.0
>  -71.86469 host ceph-a3
>   2   ssd  1.86469 osd.2up  1.0 1.0
>  -91.74640 host ceph-a4
>   3   ssd  0.87320 osd.3up  1.0 1.0
>   4   ssd  0.87320 osd.4up  1.0 1.0
> -111.74640 host ceph-a5
>   5   ssd  0.87320 osd.5up  1.0 1.0
>   6   ssd  0.87320 osd.6up  1.0 1.0
> -131.74640 host ceph-a6
>   7   ssd  0.87320 osd.7up  1.0 1.0
>   8   ssd  0.87320 osd.8up  1.0 1.0
> -151.74640 host ceph-a7
>   9   ssd  0.87320 osd.9up  1.0 1.0
>  10   ssd  0.87320 osd.10   up  1.0 1.0
> -172.61960 host ceph-a8
>  11   ssd  0.87320 osd.11   up  1.0 1.0
>  12   ssd  0.87320 osd.12   up  1.0 1.0
>  13   ssd  0.87320 osd.13   up  1.0 1.0
>
>
>
> Karun
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)

2017-12-15 Thread Cary
James,

You can set these values in ceph.conf.

[global]
...
osd pool default size = 3
osd pool default min size  = 2
...

New pools that are created will use those values.

If you run a "ceph -s"  and look at the "usage" line, it shows how
much space is: 1 used, 2 available, 3 total. ie.

usage:   19465 GB used, 60113 GB / 79578 GB avail

We choose to use Openstack with Ceph in this decade and do the other
things, not because they are easy, but because they are hard...;-p


Cary
-Dynamic

On Fri, Dec 15, 2017 at 10:12 PM, David Turner <drakonst...@gmail.com> wrote:
> In conjunction with increasing the pool size to 3, also increase the pool
> min_size to 2.  `ceph df` and `ceph osd df` will eventually show the full
> size in use in your cluster.  In particular the output of `ceph df` with
> available size in a pool takes into account the pools replication size.
> Continue watching ceph -s or ceph -w to see when the backfilling for your
> change to replication size finishes.
>
> On Fri, Dec 15, 2017 at 5:06 PM James Okken <james.ok...@dialogic.com>
> wrote:
>>
>> This whole effort went extremely well, thanks to Cary, and Im not used to
>> that with CEPH so far. (And openstack ever)
>> Thank you Cary.
>>
>> Ive upped the replication factor and now I see "replicated size 3" in each
>> of my pools. Is this the only place to check replication level? Is there a
>> Global setting or only a setting per Pool?
>>
>> ceph osd pool ls detail
>> pool 0 'rbd' replicated size 3..
>> pool 1 'images' replicated size 3...
>> ...
>>
>> One last question!
>> At this replication level how can I tell how much total space I actually
>> have now?
>> Do I just 1/3 the Global size?
>>
>> ceph df
>> GLOBAL:
>> SIZE   AVAIL  RAW USED %RAW USED
>> 13680G 12998G 682G  4.99
>> POOLS:
>> NAMEID USED %USED MAX AVAIL OBJECTS
>> rbd 0 0 0 6448G   0
>> images  1  216G  3.24 6448G   27745
>> backups 2 0 0 6448G   0
>> volumes 3  117G  1.79 6448G   30441
>> compute 4 0 0 6448G   0
>>
>> ceph osd df
>> ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE VAR  PGS
>>  0 0.81689  1.0   836G 36549M   800G 4.27 0.86  67
>>  4 3.7  1.0  3723G   170G  3553G 4.58 0.92 270
>>  1 0.81689  1.0   836G 49612M   788G 5.79 1.16  56
>>  5 3.7  1.0  3723G   192G  3531G 5.17 1.04 282
>>  2 0.81689  1.0   836G 33639M   803G 3.93 0.79  58
>>  3 3.7  1.0  3723G   202G  3521G 5.43 1.09 291
>>   TOTAL 13680G   682G 12998G 4.99
>> MIN/MAX VAR: 0.79/1.16  STDDEV: 0.67
>>
>> Thanks!
>>
>> -Original Message-
>> From: Cary [mailto:dynamic.c...@gmail.com]
>> Sent: Friday, December 15, 2017 4:05 PM
>> To: James Okken
>> Cc: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] add hard drives to 3 CEPH servers (3 server
>> cluster)
>>
>> James,
>>
>>  Those errors are normal. Ceph creates the missing files. You can check
>> "/var/lib/ceph/osd/ceph-6", before and after you run those commands to see
>> what files are added there.
>>
>>  Make sure you get the replication factor set.
>>
>>
>> Cary
>> -Dynamic
>>
>> On Fri, Dec 15, 2017 at 6:11 PM, James Okken <james.ok...@dialogic.com>
>> wrote:
>> > Thanks again Cary,
>> >
>> > Yes, once all the backfilling was done I was back to a Healthy cluster.
>> > I moved on to the same steps for the next server in the cluster, it is
>> > backfilling now.
>> > Once that is done I will do the last server in the cluster, and then I
>> > think I am done!
>> >
>> > Just checking on one thing. I get these messages when running this
>> > command. I assume this is OK, right?
>> > root@node-54:~# ceph-osd -i 4 --mkfs --mkkey --osd-uuid
>> > 25c21708-f756-4593-bc9e-c5506622cf07
>> > 2017-12-15 17:28:22.849534 7fd2f9e928c0 -1 journal FileJournal::_open:
>> > disabling aio for non-block journal.  Use journal_force_aio to force
>> > use of aio anyway
>> > 2017-12-15 17:28:22.855838 7fd2f9e928c0 -1 journal FileJournal::_open:
>> > disabling aio for non-block journal.  Use journal_force_aio to force
>> > use of aio anyway
>> > 2017-12-15 17:28:22.856444 7fd2f9e928c0 -1
>> > file

Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)

2017-12-15 Thread Cary
James,

 Those errors are normal. Ceph creates the missing files. You can
check "/var/lib/ceph/osd/ceph-6", before and after you run those
commands to see what files are added there.

 Make sure you get the replication factor set.


Cary
-Dynamic

On Fri, Dec 15, 2017 at 6:11 PM, James Okken <james.ok...@dialogic.com> wrote:
> Thanks again Cary,
>
> Yes, once all the backfilling was done I was back to a Healthy cluster.
> I moved on to the same steps for the next server in the cluster, it is 
> backfilling now.
> Once that is done I will do the last server in the cluster, and then I think 
> I am done!
>
> Just checking on one thing. I get these messages when running this command. I 
> assume this is OK, right?
> root@node-54:~# ceph-osd -i 4 --mkfs --mkkey --osd-uuid 
> 25c21708-f756-4593-bc9e-c5506622cf07
> 2017-12-15 17:28:22.849534 7fd2f9e928c0 -1 journal FileJournal::_open: 
> disabling aio for non-block journal.  Use journal_force_aio to force use of 
> aio anyway
> 2017-12-15 17:28:22.855838 7fd2f9e928c0 -1 journal FileJournal::_open: 
> disabling aio for non-block journal.  Use journal_force_aio to force use of 
> aio anyway
> 2017-12-15 17:28:22.856444 7fd2f9e928c0 -1 
> filestore(/var/lib/ceph/osd/ceph-4) could not find 
> #-1:7b3f43c4:::osd_superblock:0# in index: (2) No such file or directory
> 2017-12-15 17:28:22.893443 7fd2f9e928c0 -1 created object store 
> /var/lib/ceph/osd/ceph-4 for osd.4 fsid 2b9f7957-d0db-481e-923e-89972f6c594f
> 2017-12-15 17:28:22.893484 7fd2f9e928c0 -1 auth: error reading file: 
> /var/lib/ceph/osd/ceph-4/keyring: can't open 
> /var/lib/ceph/osd/ceph-4/keyring: (2) No such file or directory
> 2017-12-15 17:28:22.893662 7fd2f9e928c0 -1 created new key in keyring 
> /var/lib/ceph/osd/ceph-4/keyring
>
> thanks
>
> -Original Message-
> From: Cary [mailto:dynamic.c...@gmail.com]
> Sent: Thursday, December 14, 2017 7:13 PM
> To: James Okken
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)
>
> James,
>
>  Usually once the misplaced data has balanced out the cluster should reach a 
> healthy state. If you run a "ceph health detail" Ceph will show you some more 
> detail about what is happening.  Is Ceph still recovering, or has it stalled? 
> has the "objects misplaced (62.511%"
> changed to a lower %?
>
> Cary
> -Dynamic
>
> On Thu, Dec 14, 2017 at 10:52 PM, James Okken <james.ok...@dialogic.com> 
> wrote:
>> Thanks Cary!
>>
>> Your directions worked on my first sever. (once I found the missing carriage 
>> return in your list of commands, the email musta messed it up.
>>
>> For anyone else:
>> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 ceph auth add osd.4 osd
>> 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring really is 
>> 2 commands:
>> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4  and ceph auth add osd.4
>> osd 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring
>>
>> Cary, what am I looking for in ceph -w and ceph -s to show the status of the 
>> data moving?
>> Seems like the data is moving and that I have some issue...
>>
>> root@node-53:~# ceph -w
>> cluster 2b9f7957-d0db-481e-923e-89972f6c594f
>>  health HEALTH_WARN
>> 176 pgs backfill_wait
>> 1 pgs backfilling
>> 27 pgs degraded
>> 1 pgs recovering
>> 26 pgs recovery_wait
>> 27 pgs stuck degraded
>> 204 pgs stuck unclean
>> recovery 10322/84644 objects degraded (12.195%)
>> recovery 52912/84644 objects misplaced (62.511%)
>>  monmap e3: 3 mons at 
>> {node-43=192.168.1.7:6789/0,node-44=192.168.1.5:6789/0,node-45=192.168.1.3:6789/0}
>> election epoch 138, quorum 0,1,2 node-45,node-44,node-43
>>  osdmap e206: 4 osds: 4 up, 4 in; 177 remapped pgs
>> flags sortbitwise,require_jewel_osds
>>   pgmap v3936175: 512 pgs, 5 pools, 333 GB data, 58184 objects
>> 370 GB used, 5862 GB / 6233 GB avail
>> 10322/84644 objects degraded (12.195%)
>> 52912/84644 objects misplaced (62.511%)
>>  308 active+clean
>>  176 active+remapped+wait_backfill
>>   26 active+recovery_wait+degraded
>>1 active+remapped+backfilling
>>1 active+recovering+degraded recovery io 100605
>> kB/s, 14 objects/s
>>   client io 0 B/s rd, 92788 B/s wr, 50 op/s rd, 11 op/s wr
>>
>> 2017-12-14 22:45:57.459846 mon.0 [INF] pgm

Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)

2017-12-14 Thread Cary
James,

 Usually once the misplaced data has balanced out the cluster should
reach a healthy state. If you run a "ceph health detail" Ceph will
show you some more detail about what is happening.  Is Ceph still
recovering, or has it stalled? has the "objects misplaced (62.511%"
changed to a lower %?

Cary
-Dynamic

On Thu, Dec 14, 2017 at 10:52 PM, James Okken <james.ok...@dialogic.com> wrote:
> Thanks Cary!
>
> Your directions worked on my first sever. (once I found the missing carriage 
> return in your list of commands, the email musta messed it up.
>
> For anyone else:
> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 ceph auth add osd.4 osd 'allow *' 
> mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring
> really is 2 commands:
> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4
>  and
> ceph auth add osd.4 osd 'allow *' mon 'allow profile osd' -i 
> /etc/ceph/ceph.osd.4.keyring
>
> Cary, what am I looking for in ceph -w and ceph -s to show the status of the 
> data moving?
> Seems like the data is moving and that I have some issue...
>
> root@node-53:~# ceph -w
> cluster 2b9f7957-d0db-481e-923e-89972f6c594f
>  health HEALTH_WARN
> 176 pgs backfill_wait
> 1 pgs backfilling
> 27 pgs degraded
> 1 pgs recovering
> 26 pgs recovery_wait
> 27 pgs stuck degraded
> 204 pgs stuck unclean
> recovery 10322/84644 objects degraded (12.195%)
> recovery 52912/84644 objects misplaced (62.511%)
>  monmap e3: 3 mons at 
> {node-43=192.168.1.7:6789/0,node-44=192.168.1.5:6789/0,node-45=192.168.1.3:6789/0}
> election epoch 138, quorum 0,1,2 node-45,node-44,node-43
>  osdmap e206: 4 osds: 4 up, 4 in; 177 remapped pgs
> flags sortbitwise,require_jewel_osds
>   pgmap v3936175: 512 pgs, 5 pools, 333 GB data, 58184 objects
> 370 GB used, 5862 GB / 6233 GB avail
> 10322/84644 objects degraded (12.195%)
> 52912/84644 objects misplaced (62.511%)
>  308 active+clean
>  176 active+remapped+wait_backfill
>   26 active+recovery_wait+degraded
>1 active+remapped+backfilling
>1 active+recovering+degraded
> recovery io 100605 kB/s, 14 objects/s
>   client io 0 B/s rd, 92788 B/s wr, 50 op/s rd, 11 op/s wr
>
> 2017-12-14 22:45:57.459846 mon.0 [INF] pgmap v3936174: 512 pgs: 1 activating, 
> 1 active+recovering+degraded, 26 active+recovery_wait+degraded, 1 
> active+remapped+backfilling, 307 active+clean, 176 
> active+remapped+wait_backfill; 333 GB data, 369 GB used, 5863 GB / 6233 GB 
> avail; 0 B/s rd, 101107 B/s wr, 19 op/s; 10354/84644 objects degraded 
> (12.232%); 52912/84644 objects misplaced (62.511%); 12224 kB/s, 2 objects/s 
> recovering
> 2017-12-14 22:45:58.466736 mon.0 [INF] pgmap v3936175: 512 pgs: 1 
> active+recovering+degraded, 26 active+recovery_wait+degraded, 1 
> active+remapped+backfilling, 308 active+clean, 176 
> active+remapped+wait_backfill; 333 GB data, 370 GB used, 5862 GB / 6233 GB 
> avail; 0 B/s rd, 92788 B/s wr, 61 op/s; 10322/84644 objects degraded 
> (12.195%); 52912/84644 objects misplaced (62.511%); 100605 kB/s, 14 objects/s 
> recovering
> 2017-12-14 22:46:00.474335 mon.0 [INF] pgmap v3936176: 512 pgs: 1 
> active+recovering+degraded, 26 active+recovery_wait+degraded, 1 
> active+remapped+backfilling, 308 active+clean, 176 
> active+remapped+wait_backfill; 333 GB data, 370 GB used, 5862 GB / 6233 GB 
> avail; 0 B/s rd, 434 kB/s wr, 45 op/s; 10322/84644 objects degraded 
> (12.195%); 52912/84644 objects misplaced (62.511%); 84234 kB/s, 10 objects/s 
> recovering
> 2017-12-14 22:46:02.482228 mon.0 [INF] pgmap v3936177: 512 pgs: 1 
> active+recovering+degraded, 26 active+recovery_wait+degraded, 1 
> active+remapped+backfilling, 308 active+clean, 176 
> active+remapped+wait_backfill; 333 GB data, 370 GB used, 5862 GB / 6233 GB 
> avail; 0 B/s rd, 334 kB/s wr
>
>
> -Original Message-
> From: Cary [mailto:dynamic.c...@gmail.com]
> Sent: Thursday, December 14, 2017 4:21 PM
> To: James Okken
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)
>
> Jim,
>
> I am not an expert, but I believe I can assist.
>
>  Normally you will only have 1 OSD per drive. I have heard discussions about 
> using multiple OSDs per disk, when using SSDs though.
>
>  Once your drives have been installed you will have to format them, unless 
> you are using Bluestore. My steps for formatting are below.
> Replace the sXX with your drive name.
>
> parted -a optimal /dev/sXX
> print
> mklabel gpt
>

Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)

2017-12-14 Thread Cary
Jim,

I am not an expert, but I believe I can assist.

 Normally you will only have 1 OSD per drive. I have heard discussions
about using multiple OSDs per disk, when using SSDs though.

 Once your drives have been installed you will have to format them,
unless you are using Bluestore. My steps for formatting are below.
Replace the sXX with your drive name.

parted -a optimal /dev/sXX
print
mklabel gpt
unit mib
mkpart OSD4sdd1 1 -1
quit
mkfs.xfs -f /dev/sXX1

# Run blkid, and copy the UUID for the newly formatted drive.
blkid
# Add the mount point/UUID to fstab. The mount point will be created later.
vi /etc/fstab
# For example
UUID=6386bac4-7fef-3cd2-7d64-13db51d83b12 /var/lib/ceph/osd/ceph-4 xfs
rw,noatime,inode64,logbufs=8 0 0


# You can then add the OSD to the cluster.

uuidgen
# Replace the UUID below with the UUID that was created with uuidgen.
ceph osd create 23e734d7-96d8-4327-a2b9-0fbdc72ed8f1

# Notice what number of osd it creates usually the lowest # OSD available.

# Add osd.4 to ceph.conf on all Ceph nodes.
vi /etc/ceph/ceph.conf
...
[osd.4]
public addr = 172.1.3.1
cluster addr = 10.1.3.1
...

# Now add the mount point.
mkdir -p /var/lib/ceph/osd/ceph-4
chown -R ceph:ceph /var/lib/ceph/osd/ceph-4

# The command below mounts everything in fstab.
mount -a
# The number after -i below needs changed to the correct OSD ID, and
the osd-uuid needs to be changed the UUID created with uuidgen above.
Your keyring location may be different and need changed as well.
ceph-osd -i 4 --mkfs --mkkey --osd-uuid 23e734d7-96d8-4327-a2b9-0fbdc72ed8f1
chown -R ceph:ceph /var/lib/ceph/osd/ceph-4
ceph auth add osd.4 osd 'allow *' mon 'allow profile osd' -i
/etc/ceph/ceph.osd.4.keyring

# Add the new OSD to its host in the crush map.
ceph osd crush add osd.4 .0 host=YOURhostNAME

# Since the weight used in the previous step was .0, you will need to
increase it. I use 1 for a 1TB drive and 5 for a 5TB drive. The
command below will reweight osd.4 to 1. You may need to slowly ramp up
this number. ie .10 then .20 etc.
ceph osd crush reweight osd.4 1

You should now be able to start the drive. You can watch the data move
to the drive with a ceph -w. Once data has migrated to the drive,
start the next.

Cary
-Dynamic

On Thu, Dec 14, 2017 at 5:34 PM, James Okken <james.ok...@dialogic.com> wrote:
> Hi all,
>
> Please let me know if I am missing steps or using the wrong steps
>
> I'm hoping to expand my small CEPH cluster by adding 4TB hard drives to each 
> of the 3 servers in the cluster.
>
> I also need to change my replication factor from 1 to 3.
> This is part of an Openstack environment deployed by Fuel and I had foolishly 
> set my replication factor to 1 in the Fuel settings before deploy. I know 
> this would have been done better at the beginning. I do want to keep the 
> current cluster and not start over. I know this is going thrash my cluster 
> for a while replicating, but there isn't too much data on it yet.
>
>
> To start I need to safely turn off each CEPH server and add in the 4TB drive:
> To do that I am going to run:
> ceph osd set noout
> systemctl stop ceph-osd@1 (or 2 or 3 on the other servers)
> ceph osd tree (to verify it is down)
> poweroff, install the 4TB drive, bootup again
> ceph osd unset noout
>
>
>
> Next step wouyld be to get CEPH to use the 4TB drives. Each CEPH server 
> already has a 836GB OSD.
>
> ceph> osd df
> ID WEIGHT  REWEIGHT SIZE  USE  AVAIL %USE  VAR  PGS
>  0 0.81689  1.0  836G 101G  734G 12.16 0.90 167
>  1 0.81689  1.0  836G 115G  721G 13.76 1.02 166
>  2 0.81689  1.0  836G 121G  715G 14.49 1.08 179
>   TOTAL 2509G 338G 2171G 13.47
> MIN/MAX VAR: 0.90/1.08  STDDEV: 0.97
>
> ceph> df
> GLOBAL:
> SIZE  AVAIL RAW USED %RAW USED
> 2509G 2171G 338G 13.47
> POOLS:
> NAMEID USED %USED MAX AVAIL OBJECTS
> rbd 0 0 0 2145G   0
> images  1  216G  9.15 2145G   27745
> backups 2 0 0 2145G   0
> volumes 3  114G  5.07 2145G   29717
> compute 4 0 0 2145G   0
>
>
> Once I get the 4TB drive into each CEPH server should I look to increasing 
> the current OSD (ie: to 4836GB)?
> Or create a second 4000GB OSD on each CEPH server?
> If I am going to create a second OSD on each CEPH server I hope to use this 
> doc:
> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/
>
>
>
> As far as changing the replication factor from 1 to 3:
> Here are my pools now:
>
> ceph osd pool ls detail
> pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash 
> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspo

[ceph-users] Upgrade from Jewel to Luminous. REQUIRE_JEWEL OSDMap

2017-11-27 Thread Cary
Hello,

 Could someone please help me complete my botched upgrade from Jewel
10.2.3-r1 to Luminous 12.2.1. I have 9 Gentoo servers, 4 of which have 2
OSDs each.

 My OSD servers were accidentally rebooted before the monitor servers
causing them to be running Luminous before the monitors. All services have
been restarted and running ceph versions gives the following:

# ceph versions
2017-11-27 21:27:24.356940 7fed67efe700 -1 WARNING: the following dangerous
and experimental features are enabled: btrfs
2017-11-27 21:27:24.368469 7fed67efe700 -1 WARNING: the following dangerous
and experimental features are enabled: btrfs
{
"mon": {
"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
luminous (stable)": 4
},
"mgr": {
"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
luminous (stable)": 3
},
"osd": {},
"mds": {
"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
luminous (stable)": 1
},
"overall": {
"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
luminous (stable)": 8
}
}

For some reason the OSDs do not show what version they are running, and a
ceph osd tree shows all of the OSD as being down.

 # ceph osd tree
2017-11-27 21:32:51.969335 7f483d9c2700 -1 WARNING: the following dangerous
and experimental features are enabled: btrfs
2017-11-27 21:32:51.980976 7f483d9c2700 -1 WARNING: the following dangerous
and experimental features are enabled: btrfs
ID CLASS WEIGHT   TYPE NAME  STATUS REWEIGHT PRI-AFF
-1   27.77998 root default
-3   27.77998 datacenter DC1
-6   27.77998 rack 1B06
-56.48000 host ceph3
 11.84000 osd.1down0 1.0
 34.64000 osd.3down0 1.0
-25.53999 host ceph4
 54.64000 osd.5down0 1.0
 80.8 osd.8down0 1.0
-49.28000 host ceph6
 04.64000 osd.0down0 1.0
 24.64000 osd.2down0 1.0
-76.48000 host ceph7
 64.64000 osd.6down0 1.0
 71.84000 osd.7down0 1.0

The OSD logs all have this message:

20235 osdmap REQUIRE_JEWEL OSDMap flag is NOT set; please set it

When I try to set it with "ceph osd set require_jewel_osds" I get this
error:

Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_JEWEL feature

A "ceph features" returns:

"mon": {
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 4
}
},
"mds": {
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 1
}
},
"osd": {
"group": {
"features": "0x1ffddff8eea4fffb",
    "release": "luminous",
"num": 8
}
},
"client": {
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 3

Is there any way I can get these OSDs to join the cluster now, or recover
my data?

Cary
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com