Re: [ceph-users] OSD stuck in booting state

2019-03-25 Thread PHARABOT Vincent
Hello folks,

Nobody to give me a hint ?

The communication and auth with mon is ok

2019-03-25 14:16:25.342 7fa3af260700 1 -- 10.8.33.158:6789/0 <== osd.0 
10.8.33.183:6800/293177 184  auth(proto 2 2 bytes epoch 0) v1  32+0+0 
(2260890001 0 0) 0x559759ffd680 con 0x55975548700
0
2019-03-25 14:16:25.342 7fa3af260700 10 mon.2@1(peon).auth v146 
preprocess_query auth(proto 2 2 bytes epoch 0) v1 from osd.0 
10.8.33.183:6800/293177
2019-03-25 14:16:25.342 7fa3af260700 10 mon.2@1(peon).auth v146 prep_auth() 
blob_size=2
2019-03-25 14:16:25.342 7fa3af260700 2 mon.2@1(peon) e1 send_reply 
0x55976b3bf320 0x559754bb1200 auth_reply(proto 2 0 (0) Success) v1
2019-03-25 14:16:25.342 7fa3af260700 1 -- 10.8.33.158:6789/0 --> 
10.8.33.183:6800/293177 -- auth_reply(proto 2 0 (0) Success) v1 -- 
0x559754bb1200 con 0

But the OSD is still in booting state

FSID seems correct... so I'm lost here.
Nothing in the osd logs (even with debug to 20) except some complain about mgr 
which reject osd report because osd metadata not complete (I guess due to osd 
booting state)

One thing to notice, I came to this status after redeploying the VMs hosting 
Ceph cluster, so IP addresses have changed

Somebody to help ?

# ceph osd dump
epoch 15
fsid 5267611a-48f7-4979-823e-84531e104d63
created 2019-03-20 18:14:24.296267
modified 2019-03-22 14:38:45.816422
flags sortbitwise,recovery_deletes,purged_snapdirs
crush_version 5
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
min_compat_client jewel
require_osd_release mimic
max_osd 3
osd.0 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) 
- - - - exists 32d92b43-6333-4c5c-8153-af373ce12e62
osd.1 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) 
- - - - exists 07b03870-1bd9-42f9-ac61-9e9be3b30e73
osd.2 down in weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) 
- - - - exists b77f8ae8-82cf-4e31-9e36-f510698abf8e

Thank you !!
Vincent

De : PHARABOT Vincent
Envoyé : vendredi 22 mars 2019 10:45
À : 'ceph-users@lists.ceph.com' 
Objet : OSD stuck in booting state

Hello cephers

I would need your help once again (still ceph beginner sorry)

In a cluster I have 3 osd which could not be seen as up, still stuck on down 
state. Of course osd process are running.

On osd side, the osd is stuck on booting state since a long time
It doesn't look like a network or communication issue between osd and mon

I guess something wrong on osd side but could not figure out what for now...

Thanks a lot for your help !

# ceph -s
cluster:
id: 5267611a-48f7-4979-823e-84531e104d63
health: HEALTH_WARN
3 slow ops, oldest one blocked for 134780 sec, daemons [mon.1,mon.2] have slow 
ops.

services:
mon: 3 daemons, quorum 1,2,0
mgr: mgr.2(active), standbys: mgr.0, mgr.1
osd: 3 osds: 0 up, 0 in

data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:

# ceph health detail
HEALTH_WARN 3 slow ops, oldest one blocked for 134795 sec, daemons 
[mon.1,mon.2] have slow ops.
SLOW_OPS 3 slow ops, oldest one blocked for 134795 sec, daemons [mon.1,mon.2] 
have slow ops.

# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 2.92978 root default
-4 0.97659 host ip-10-8-33-183
0 hdd 0.97659 osd.0 down 0 1.0
-3 0.97659 host ip-10-8-64-158
2 0.97659 osd.2 down 0 1.0
-2 0.97659 host ip-10-8-85-231

# ceph osd dump
epoch 7
fsid 5267611a-48f7-4979-823e-84531e104d63
created 2019-03-20 18:14:24.296267
modified 2019-03-21 09:26:58.920300
flags sortbitwise,recovery_deletes,purged_snapdirs
crush_version 5
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
min_compat_client jewel
require_osd_release mimic
max_osd 3
osd.0 down out weight 0 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) 
- - - - exists,new 32d92b43-6333-4c5c-8153-af373ce12e62
osd.1 down out weight 0 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) 
- - - - exists,new 07b03870-1bd9-42f9-ac61-9e9be3b30e73
osd.2 down out weight 0 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) 
- - - - exists,new b77f8ae8-82cf-4e31-9e36-f510698abf8e

"ops": [
{
"description": "osd_boot(osd.0 booted 0 features 4611087854031142907 v17)",
"initiated_at": "2019-03-22 08:47:20.243710",
"age": 405.638170,
"duration": 405.638185,
"type_data": {
"events": [
{
"time": "2019-03-22 08:47:20.243710",
"event": "initiated"
},
{
"time": "2019-03-22 08:47:20.243710",
"event": "header_read"
},
{
"time": "2019-03-22 08:47:20.243713",
"event": "throttled"
},
{
"time": "2019-03-22 08:47:20.243766",
"event": "all_read"
},
{
"time": "2019-03-22 08:47:20.243821",
"event": "dispatched"
},
{
"ti

[ceph-users] OSD stuck in booting state

2019-03-22 Thread PHARABOT Vincent
Hello cephers

I would need your help once again…. (still ceph beginner sorry)

In a cluster I have 3 osd which could not be seen as up, still stuck on down 
state. Of course osd process are running.

On osd side, the osd is stuck on booting state since a long time
It doesn’t look like a network or communication issue between osd and mon

I guess something wrong on osd side but could not figure out what for now…

Thanks a lot for your help !

# ceph -s
cluster:
id: 5267611a-48f7-4979-823e-84531e104d63
health: HEALTH_WARN
3 slow ops, oldest one blocked for 134780 sec, daemons [mon.1,mon.2] have slow 
ops.

services:
mon: 3 daemons, quorum 1,2,0
mgr: mgr.2(active), standbys: mgr.0, mgr.1
osd: 3 osds: 0 up, 0 in

data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:

# ceph health detail
HEALTH_WARN 3 slow ops, oldest one blocked for 134795 sec, daemons 
[mon.1,mon.2] have slow ops.
SLOW_OPS 3 slow ops, oldest one blocked for 134795 sec, daemons [mon.1,mon.2] 
have slow ops.

# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 2.92978 root default
-4 0.97659 host ip-10-8-33-183
0 hdd 0.97659 osd.0 down 0 1.0
-3 0.97659 host ip-10-8-64-158
2 0.97659 osd.2 down 0 1.0
-2 0.97659 host ip-10-8-85-231

# ceph osd dump
epoch 7
fsid 5267611a-48f7-4979-823e-84531e104d63
created 2019-03-20 18:14:24.296267
modified 2019-03-21 09:26:58.920300
flags sortbitwise,recovery_deletes,purged_snapdirs
crush_version 5
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
min_compat_client jewel
require_osd_release mimic
max_osd 3
osd.0 down out weight 0 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) 
- - - - exists,new 32d92b43-6333-4c5c-8153-af373ce12e62
osd.1 down out weight 0 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) 
- - - - exists,new 07b03870-1bd9-42f9-ac61-9e9be3b30e73
osd.2 down out weight 0 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) 
- - - - exists,new b77f8ae8-82cf-4e31-9e36-f510698abf8e

"ops": [
{
"description": "osd_boot(osd.0 booted 0 features 4611087854031142907 v17)",
"initiated_at": "2019-03-22 08:47:20.243710",
"age": 405.638170,
"duration": 405.638185,
"type_data": {
"events": [
{
"time": "2019-03-22 08:47:20.243710",
"event": "initiated"
},
{
"time": "2019-03-22 08:47:20.243710",
"event": "header_read"
},
{
"time": "2019-03-22 08:47:20.243713",
"event": "throttled"
},
{
"time": "2019-03-22 08:47:20.243766",
"event": "all_read"
},
{
"time": "2019-03-22 08:47:20.243821",
"event": "dispatched"
},
{
"time": "2019-03-22 08:47:20.243826",
"event": "mon:_ms_dispatch"
},
{
"time": "2019-03-22 08:47:20.243827",
"event": "mon:dispatch_op"
},
{
"time": "2019-03-22 08:47:20.243827",
"event": "psvc:dispatch"
},
{
"time": "2019-03-22 08:47:20.243828",
"event": "osdmap:wait_for_readable"
},
{
"time": "2019-03-22 08:47:20.243829",
"event": "osdmap:wait_for_finished_proposal"
},
{
"time": "2019-03-22 08:47:21.064088",
"event": "callback retry"
},
{
"time": "2019-03-22 08:47:21.064090",
"event": "psvc:dispatch"
},
{
"time": "2019-03-22 08:47:21.064091",
"event": "osdmap:wait_for_readable"
},
{



OSD side:
[root@ip-10-8-33-183 ~]# ceph daemon osd.0 status
{
"cluster_fsid": "5267611a-48f7-4979-823e-84531e104d63",
"osd_fsid": "32d92b43-6333-4c5c-8153-af373ce12e62",
"whoami": 0,
"state": "booting",
"oldest_map": 1,
"newest_map": 17,
"num_pgs": 200
}

Vincent

This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systèmes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.


Please be informed that your personal data are processed according to our data 
privacy policy as described on our website. Should you have any questions 
related to personal data protection, please contact 3DS Data Protection Officer 
at 3ds.compliance-priv...@3ds.com


For other languages, go to https://www.3ds.com/terms/email-disclaimer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph OSD: how to keep files after umount or reboot vs tempfs ?

2019-02-19 Thread PHARABOT Vincent
Ok thank you for confirmation Burkhard

I’m trying this

Vincent

De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de 
Burkhard Linke
Envoyé : mardi 19 février 2019 13:20
À : ceph-users@lists.ceph.com
Objet : Re: [ceph-users] Ceph OSD: how to keep files after umount or reboot vs 
tempfs ?


Hi,
On 2/19/19 11:52 AM, PHARABOT Vincent wrote:
Hello Cephers,

I have an issue with OSD device mount on tmpfs with bluestore
For some occasion, I need to keep the files on the tiny bluestore fs 
(especially keyring and may be other useful files needed for osd to work) on a 
working OSD
Since osd partition is mount as tmpfs , these files are deleted once VM 
rebooted or even when umount

Is there a way to have a persistent storage for those files instead of tmpfs ?
I could copy them in another location and copy back once rebooted, but this 
seems very odd



Those files are generated on osd activation from information stored in LVM 
metadata. You do not need an extra external storage for the information any 
more.



Regards,

Burkhard Linke


This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systèmes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.


Please be informed that your personal data are processed according to our data 
privacy policy as described on our website. Should you have any questions 
related to personal data protection, please contact 3DS Data Protection Officer 
at 3ds.compliance-priv...@3ds.com<mailto:3ds.compliance-priv...@3ds.com>


For other languages, go to https://www.3ds.com/terms/email-disclaimer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph OSD: how to keep files after umount or reboot vs tempfs ?

2019-02-19 Thread PHARABOT Vincent
Hello Cephers,

I have an issue with OSD device mount on tmpfs with bluestore
For some occasion, I need to keep the files on the tiny bluestore fs 
(especially keyring and may be other useful files needed for osd to work) on a 
working OSD
Since osd partition is mount as tmpfs , these files are deleted once VM 
rebooted or even when umount

Is there a way to have a persistent storage for those files instead of tmpfs ?
I could copy them in another location and copy back once rebooted, but this 
seems very odd

May be I need to keep keyring and use ceph-volume command (lvm activate ?) to 
recover the files

Do you have any best practice for this use case ?

Thanks a lot for your help!

Vincent

This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systèmes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.


Please be informed that your personal data are processed according to our data 
privacy policy as described on our website. Should you have any questions 
related to personal data protection, please contact 3DS Data Protection Officer 
at 3ds.compliance-priv...@3ds.com


For other languages, go to https://www.3ds.com/terms/email-disclaimer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-31 Thread PHARABOT Vincent
I tried to start on the Monitor node itself
Yes Dashboard is enabled

# ceph mgr services
{
"dashboard": "https://ip-10-8-36-16.internal:8443/;,
"restful": "https://ip-10-8-36-16.internal:8003/;
}

# curl -k https://ip-10-8-36-16.eu-west-2.compute.internal:8443/api/health
{"status": "404 Not Found", "version": "3.2.2", "detail": "The path 
'/api/health' was not found.", "traceback": "Traceback (most recent call 
last):\n File \"/usr/lib/python2.7/si
te-packages/cherrypy/_cprequest.py\", line 656, in respond\n response.body = 
self.handler()\n File 
\"/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py\", line 188, in
__call__\n self.body = self.oldhandler(*args, **kwargs)\n File 
\"/usr/lib/python2.7/site-packages/cherrypy/_cperror.py\", line 386, in 
__call__\n raise self\nNotFound: (404
, \"The path '/api/health' was not found.\")\n"}

# curl -k 
https://ip-10-8-36-16.eu-west-2.compute.internal:8443/api/health/minimal
{"status": "404 Not Found", "version": "3.2.2", "detail": "The path 
'/api/health/minimal' was not found.", "traceback": "Traceback (most recent 
call last):\n File \"/usr/lib/pyth
on2.7/site-packages/cherrypy/_cprequest.py\", line 656, in respond\n 
response.body = self.handler()\n File 
\"/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py\", line
188, in __call__\n self.body = self.oldhandler(*args, **kwargs)\n File 
\"/usr/lib/python2.7/site-packages/cherrypy/_cperror.py\", line 386, in 
__call__\n raise self\nNotFou
nd: (404, \"The path '/api/health/minimal' was not found.\")\n"}

Vincent

-Message d'origine-
De : Lenz Grimmer [mailto:lgrim...@suse.com]
Envoyé : jeudi 31 janvier 2019 00:36
À : PHARABOT Vincent ; ceph-users@lists.ceph.com
Objet : RE: [ceph-users] Simple API to have cluster healthcheck ?



Am 30. Januar 2019 19:33:14 MEZ schrieb PHARABOT Vincent 
:

>Thanks for the info
>But, nope, on Mimic (13.2.4) /api/health ends in 404 (/api/health/full,
>/api/health/minimal also...)

On which node did you try to access the API? Did you enable the Dashboard 
module in Ceph manager?

Lenz

--
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systèmes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.


Please be informed that your personal data are processed according to our data 
privacy policy as described on our website. Should you have any questions 
related to personal data protection, please contact 3DS Data Protection Officer 
at 3ds.compliance-priv...@3ds.com<mailto:3ds.compliance-priv...@3ds.com>


For other languages, go to https://www.3ds.com/terms/email-disclaimer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread PHARABOT Vincent
Hello

Thanks for the info
But, nope, on Mimic (13.2.4) /api/health ends in 404 (/api/health/full, 
/api/health/minimal also...)

Vincent

-Message d'origine-
De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de Lenz 
Grimmer
Envoyé : mercredi 30 janvier 2019 16:26
À : ceph-users@lists.ceph.com
Objet : Re: [ceph-users] Simple API to have cluster healthcheck ?

Hi,

On 1/30/19 2:02 PM, PHARABOT Vincent wrote:

> I have my cluster set up correctly now (thank you again for the help)

What version of Ceph is this?

> I am seeking now a way to get cluster health thru API (REST) with curl
> command.
>
> I had a look at manager / RESTful and Dashboard but none seems to
> provide simple way to get cluster health
>
> RESTful module do a lot of things but I didn’t find the simple health
> check result – moreover I don’t want monitoring user to be able to do
> all the command in this module.
>
> Dashboard is a dashboard so could not get health thru curl

Hmm, the Mimic dashboard's REST API should expose an "/api/health"
endpoint. Have you tried that one?

For Nautilus, this seems to has been split into /api/health/full and 
/api/health/minimal, to reduce the overhead.

Lenz

--
SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany) GF:Felix 
Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)

This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systèmes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.


Please be informed that your personal data are processed according to our data 
privacy policy as described on our website. Should you have any questions 
related to personal data protection, please contact 3DS Data Protection Officer 
at 3ds.compliance-priv...@3ds.com<mailto:3ds.compliance-priv...@3ds.com>


For other languages, go to https://www.3ds.com/terms/email-disclaimer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread PHARABOT Vincent
Hi

Yes it could do the job in the meantime

Thank you !
Vincent

-Message d'origine-
De : Alexandru Cucu [mailto:m...@alexcucu.ro]
Envoyé : mercredi 30 janvier 2019 14:31
À : PHARABOT Vincent 
Cc : ceph-users@lists.ceph.com
Objet : Re: [ceph-users] Simple API to have cluster healthcheck ?

Hello,

Not exactly what you were looking for, but you could use the Prometheus plugin 
for ceph-mgr and get the health status from the metrics.

curl -s http://ceph-mgr-node:9283/metrics | grep ^ceph_health_status


On Wed, Jan 30, 2019 at 3:04 PM PHARABOT Vincent  
wrote:
>
> Hello,
>
>
>
> I have my cluster set up correctly now (thank you again for the help)
>
>
>
> I am seeking now a way to get cluster health thru API (REST) with curl 
> command.
>
> I had a look at manager / RESTful and Dashboard but none seems to
> provide simple way to get cluster health
>
> RESTful module do a lot of things but I didn’t find the simple health check 
> result – moreover I don’t want monitoring user to be able to do all the 
> command in this module.
>
> Dashboard is a dashboard so could not get health thru curl
>
>
>
> It seems it was possible with “ceph-rest-api” but it looks like this
> tools is no more available in ceph-common…
>
>
>
> Is there a simple way to have this ? (without writing python mgr
> module which will take a lot of time for this)
>
>
>
> Thank you
>
> Vincent
>
>
>
> This email and any attachments are intended solely for the use of the 
> individual or entity to whom it is addressed and may be confidential and/or 
> privileged.
>
> If you are not one of the named recipients or have received this email
> in error,
>
> (i) you should not read, disclose, or copy it,
>
> (ii) please notify sender of your receipt by reply email and delete
> this email and all attachments,
>
> (iii) Dassault Systèmes does not accept or assume any liability or 
> responsibility for any use of or reliance on this email.
>
>
> Please be informed that your personal data are processed according to
> our data privacy policy as described on our website. Should you have
> any questions related to personal data protection, please contact 3DS
> Data Protection Officer at 3ds.compliance-priv...@3ds.com
>
>
> For other languages, go to https://www.3ds.com/terms/email-disclaimer
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systèmes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.


Please be informed that your personal data are processed according to our data 
privacy policy as described on our website. Should you have any questions 
related to personal data protection, please contact 3DS Data Protection Officer 
at 3ds.compliance-priv...@3ds.com<mailto:3ds.compliance-priv...@3ds.com>


For other languages, go to https://www.3ds.com/terms/email-disclaimer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread PHARABOT Vincent
Hello,

I have my cluster set up correctly now (thank you again for the help)

I am seeking now a way to get cluster health thru API (REST) with curl command.
I had a look at manager / RESTful and Dashboard but none seems to provide 
simple way to get cluster health
RESTful module do a lot of things but I didn’t find the simple health check 
result – moreover I don’t want monitoring user to be able to do all the command 
in this module.
Dashboard is a dashboard so could not get health thru curl

It seems it was possible with “ceph-rest-api” but it looks like this tools is 
no more available in ceph-common…

Is there a simple way to have this ? (without writing python mgr module which 
will take a lot of time for this)

Thank you
Vincent


This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systèmes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.


Please be informed that your personal data are processed according to our data 
privacy policy as described on our website. Should you have any questions 
related to personal data protection, please contact 3DS Data Protection Officer 
at 3ds.compliance-priv...@3ds.com


For other languages, go to https://www.3ds.com/terms/email-disclaimer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bright new cluster get all pgs stuck in inactive

2019-01-29 Thread PHARABOT Vincent
Sorry JC, here is the correct osd crush rule dump (type=chassis instead of host)

# ceph osd crush rule dump
[
{
"rule_id": 0,
"rule_name": "replicated_rule",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "chassis"
},
{
"op": "emit"
}
]
}
]

De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de 
PHARABOT Vincent
Envoyé : mardi 29 janvier 2019 19:33
À : Jean-Charles Lopez 
Cc : ceph-users@lists.ceph.com
Objet : Re: [ceph-users] Bright new cluster get all pgs stuck in inactive

Thanks for the quick reply

Here is the result

# ceph osd crush rule dump
[
{
"rule_id": 0,
"rule_name": "replicated_rule",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
   "op": "emit"
}
]
}
]

De : Jean-Charles Lopez [mailto:jelo...@redhat.com]
Envoyé : mardi 29 janvier 2019 19:30
À : PHARABOT Vincent mailto:vincent.phara...@3ds.com>>
Cc : ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Objet : Re: [ceph-users] Bright new cluster get all pgs stuck in inactive

Hi,

I suspect your generated CRUSH rule is incorret because of 
osd_crush_cooseleaf_type=2 and by default chassis bucket are not created.

Changing the type of bucket to host (osd_crush_cooseleaf_type=1 which is the 
default when using old ceph-deploy or ceph-ansible) for your deployment should 
fix the problem.

Could you show the output of ceph osd crush rule dump to verify how the rule 
was built

JC

On Jan 29, 2019, at 10:08, PHARABOT Vincent 
mailto:vincent.phara...@3ds.com>> wrote:

Hello,

I have a bright new cluster with 2 pools, but cluster keeps pgs in inactive 
state.
I have 3 OSDs and 1 Mon… all seems ok except I could not have pgs in 
clean+active state !

I might miss something obvious but I really don’t know what…. Someone could 
help me ?
I tried to seek answers among the list mail threads, but no luck, other 
situation seems different

Thank you a lot for your help

Vincent

# ceph -v
ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)

# ceph -s
cluster:
id: ff4c91fb-3c29-4d9f-a26f-467d6b6a712e
health: HEALTH_WARN
Reduced data availability: 200 pgs inactive

services:
mon: 1 daemons, quorum 
ip-10-8-66-123.eu<https://smex12-5-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=http%3a%2f%2fip%2d10%2d8%2d66%2d123.eu=af002b2c-fbed-454a-b9a9-7cea2c8a92dc=4f3191c22829f864cf0f15b19d85643d21f89aa9-4c22f9d1d1cf9203de2136c6ebe551acbf9d3d45>-west-2.compute.internal
mgr: 
ip-10-8-66-123.eu<https://smex12-5-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=http%3a%2f%2fip%2d10%2d8%2d66%2d123.eu=af002b2c-fbed-454a-b9a9-7cea2c8a92dc=4f3191c22829f864cf0f15b19d85643d21f89aa9-4c22f9d1d1cf9203de2136c6ebe551acbf9d3d45>-west-2.compute.internal(active)
osd: 3 osds: 3 up, 3 in

data:
pools: 2 pools, 200 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 2.9 TiB / 2.9 TiB avail
pgs: 100.000% pgs unknown
200 unknown

# ceph osd tree -f json-pretty

{
"nodes": [
{
"id": -1,
"name": "default",
"type": "root",
"type_id": 10,
"children": [
-3,
-5,
-7
]
},
{
"id": -7,
"name": "ip-10-8-10-108",
"type": "host",
"type_id": 1,
"pool_weights": {},
"children": [
2
]
},
{
"id": 2,
"device_class": "hdd",
"name": "osd.2",
"type": "osd",
"type_id": 0,
"crush_weight": 0.976593,
"depth": 2,
"pool_weights": {},
"exists": 1,
"status": "up",
"reweight": 1.00,
"primary_affinity"

Re: [ceph-users] Bright new cluster get all pgs stuck in inactive

2019-01-29 Thread PHARABOT Vincent
Thanks for the quick reply

Here is the result

# ceph osd crush rule dump
[
{
"rule_id": 0,
"rule_name": "replicated_rule",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
    },
{
   "op": "emit"
}
]
}
]

De : Jean-Charles Lopez [mailto:jelo...@redhat.com]
Envoyé : mardi 29 janvier 2019 19:30
À : PHARABOT Vincent 
Cc : ceph-users@lists.ceph.com
Objet : Re: [ceph-users] Bright new cluster get all pgs stuck in inactive

Hi,

I suspect your generated CRUSH rule is incorret because of 
osd_crush_cooseleaf_type=2 and by default chassis bucket are not created.

Changing the type of bucket to host (osd_crush_cooseleaf_type=1 which is the 
default when using old ceph-deploy or ceph-ansible) for your deployment should 
fix the problem.

Could you show the output of ceph osd crush rule dump to verify how the rule 
was built

JC

On Jan 29, 2019, at 10:08, PHARABOT Vincent 
mailto:vincent.phara...@3ds.com>> wrote:

Hello,

I have a bright new cluster with 2 pools, but cluster keeps pgs in inactive 
state.
I have 3 OSDs and 1 Mon… all seems ok except I could not have pgs in 
clean+active state !

I might miss something obvious but I really don’t know what…. Someone could 
help me ?
I tried to seek answers among the list mail threads, but no luck, other 
situation seems different

Thank you a lot for your help

Vincent

# ceph -v
ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)

# ceph -s
cluster:
id: ff4c91fb-3c29-4d9f-a26f-467d6b6a712e
health: HEALTH_WARN
Reduced data availability: 200 pgs inactive

services:
mon: 1 daemons, quorum 
ip-10-8-66-123.eu<https://smex12-5-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=http%3a%2f%2fip%2d10%2d8%2d66%2d123.eu=af002b2c-fbed-454a-b9a9-7cea2c8a92dc=4f3191c22829f864cf0f15b19d85643d21f89aa9-4c22f9d1d1cf9203de2136c6ebe551acbf9d3d45>-west-2.compute.internal
mgr: 
ip-10-8-66-123.eu<https://smex12-5-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=http%3a%2f%2fip%2d10%2d8%2d66%2d123.eu=af002b2c-fbed-454a-b9a9-7cea2c8a92dc=4f3191c22829f864cf0f15b19d85643d21f89aa9-4c22f9d1d1cf9203de2136c6ebe551acbf9d3d45>-west-2.compute.internal(active)
osd: 3 osds: 3 up, 3 in

data:
pools: 2 pools, 200 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 2.9 TiB / 2.9 TiB avail
pgs: 100.000% pgs unknown
200 unknown

# ceph osd tree -f json-pretty

{
"nodes": [
{
"id": -1,
"name": "default",
"type": "root",
"type_id": 10,
"children": [
-3,
-5,
-7
]
},
{
"id": -7,
"name": "ip-10-8-10-108",
"type": "host",
"type_id": 1,
"pool_weights": {},
"children": [
2
]
},
{
"id": 2,
"device_class": "hdd",
"name": "osd.2",
"type": "osd",
"type_id": 0,
"crush_weight": 0.976593,
"depth": 2,
"pool_weights": {},
"exists": 1,
"status": "up",
"reweight": 1.00,
"primary_affinity": 1.00
},
{
"id": -5,
"name": "ip-10-8-22-148",
"type": "host",
"type_id": 1,
"pool_weights": {},
"children": [
1
]
},
{
"id": 1,
"device_class": "hdd",
"name": "osd.1",
"type": "osd",
"type_id": 0,
"crush_weight": 0.976593,
"depth": 2,
"pool_weights": {},
"exists": 1,
"status": "up",
"reweight": 1.00,
"primary_affinity": 1.00
},
{
"id": -3,
   

[ceph-users] Bright new cluster get all pgs stuck in inactive

2019-01-29 Thread PHARABOT Vincent
Hello,

I have a bright new cluster with 2 pools, but cluster keeps pgs in inactive 
state.
I have 3 OSDs and 1 Mon… all seems ok except I could not have pgs in 
clean+active state !

I might miss something obvious but I really don’t know what…. Someone could 
help me ?
I tried to seek answers among the list mail threads, but no luck, other 
situation seems different

Thank you a lot for your help

Vincent

# ceph -v
ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)

# ceph -s
cluster:
id: ff4c91fb-3c29-4d9f-a26f-467d6b6a712e
health: HEALTH_WARN
Reduced data availability: 200 pgs inactive

services:
mon: 1 daemons, quorum ip-10-8-66-123.eu-west-2.compute.internal
mgr: ip-10-8-66-123.eu-west-2.compute.internal(active)
osd: 3 osds: 3 up, 3 in

data:
pools: 2 pools, 200 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 2.9 TiB / 2.9 TiB avail
pgs: 100.000% pgs unknown
200 unknown

# ceph osd tree -f json-pretty

{
"nodes": [
{
"id": -1,
"name": "default",
"type": "root",
"type_id": 10,
"children": [
-3,
-5,
-7
]
},
{
"id": -7,
"name": "ip-10-8-10-108",
"type": "host",
"type_id": 1,
"pool_weights": {},
"children": [
2
]
},
{
"id": 2,
"device_class": "hdd",
"name": "osd.2",
"type": "osd",
"type_id": 0,
"crush_weight": 0.976593,
"depth": 2,
"pool_weights": {},
"exists": 1,
"status": "up",
"reweight": 1.00,
"primary_affinity": 1.00
},
{
"id": -5,
"name": "ip-10-8-22-148",
"type": "host",
"type_id": 1,
"pool_weights": {},
"children": [
1
]
},
{
"id": 1,
"device_class": "hdd",
"name": "osd.1",
"type": "osd",
"type_id": 0,
"crush_weight": 0.976593,
"depth": 2,
"pool_weights": {},
"exists": 1,
"status": "up",
"reweight": 1.00,
"primary_affinity": 1.00
},
{
"id": -3,
"name": "ip-10-8-5-246",
"type": "host",
"type_id": 1,
"pool_weights": {},
"children": [
0
]
},
{
"id": 0,
"device_class": "hdd",
"name": "osd.0",
"type": "osd",
"type_id": 0,
"crush_weight": 0.976593,
   "depth": 2,
"pool_weights": {},
"exists": 1,
"status": "up",
"reweight": 1.00,
"primary_affinity": 1.00
}
],
"stray": []
}

# cat /etc/ceph/ceph.conf
[global]
fsid = ff4c91fb-3c29-4d9f-a26f-467d6b6a712e
mon initial members = ip-10-8-66-123
mon host = 10.8.66.123
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
pid file = /var/run/$cluster/$type.pid


#Choose reasonable numbers for number of replicas and placement groups.
osd pool default size = 3 # Write an object 3 times
osd pool default min size = 2 # Allow writing 2 copy in a degraded state
osd pool default pg num = 100
osd pool default pgp num = 100

#Choose a reasonable crush leaf type
#0 for a 1-node cluster.
#1 for a multi node cluster in a single rack
#2 for a multi node, multi chassis cluster with multiple hosts in a chassis
#3 for a multi node cluster with hosts across racks, etc.
osd crush chooseleaf type = 2

[mon]
debug mon = 20

# ceph health detail
HEALTH_WARN Reduced data availability: 200 pgs inactive
PG_AVAILABILITY Reduced data availability: 200 pgs inactive
pg 1.46 is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.47 is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.48 is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.49 is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.4a is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.4b is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.4c is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.4d is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.4e is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.4f is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.50 is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.51 is stuck inactive for