Re: [ceph-users] ceph block - volume with RAID#0

2019-01-31 Thread Janne Johansson
Den fre 1 feb. 2019 kl 06:30 skrev M Ranga Swami Reddy :

> Here user requirement is - less write and more reads...so not much
> worried on performance .
>

So why go for raid0 at all?
It is the least secure way to store data.


-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v12.2.11 Luminous released

2019-01-31 Thread Wido den Hollander


On 2/1/19 8:44 AM, Abhishek wrote:
> We are glad to announce the eleventh bug fix release of the Luminous
> v12.2.x long term stable release series. We recommend that all users
> upgrade to this release. Please note the following precautions while
> upgrading.
> 
> Notable Changes
> ---
> 
> * This release fixes the pg log hard limit bug that was introduced in
>   12.2.9, https://tracker.ceph.com/issues/36686.  A flag called
>   `pglog_hardlimit` has been introduced, which is off by default. Enabling
>   this flag will limit the length of the pg log.  In order to enable
>   that, the flag must be set by running `ceph osd set pglog_hardlimit`
>   after completely upgrading to 12.2.11. Once the cluster has this flag
>   set, the length of the pg log will be capped by a hard limit. Once set,
>   this flag *must not* be unset anymore.
> 
> * There have been fixes to RGW dynamic and manual resharding, which no
> longer
>   leaves behind stale bucket instances to be removed manually. For
> finding and
>   cleaning up older instances from a reshard a radosgw-admin command
> `reshard
>   stale-instances list` and `reshard stale-instances rm` should do the
> necessary
>   cleanup.
> 

Great news! I hope this works! This has been biting a lot of people in
the last year. I have helped a lot of people to manually clean this up,
but it's great that this is now available as a regular command.

Wido

> For the complete changelog, please refer to the release blog entry at
> https://ceph.com/releases/v12-2-11-luminous-released/
> 
> Getting ceph:
> 
> * Git at git://github.com/ceph/ceph.git
> * Tarball at http://download.ceph.com/tarballs/ceph-12.2.11.tar.gz
> * For packages, see http://docs.ceph.com/docs/master/install/get-packages/
> * Release git sha1: 26dc3775efc7bb286a1d6d66faee0ba30ea23eee
> 
> Best,
> Abhishek
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v12.2.11 Luminous released

2019-01-31 Thread Abhishek
We are glad to announce the eleventh bug fix release of the Luminous 
v12.2.x long term stable release series. We recommend that all users 
upgrade to this release. Please note the following precautions while 
upgrading.


Notable Changes
---

* This release fixes the pg log hard limit bug that was introduced in
  12.2.9, https://tracker.ceph.com/issues/36686.  A flag called
  `pglog_hardlimit` has been introduced, which is off by default. 
Enabling

  this flag will limit the length of the pg log.  In order to enable
  that, the flag must be set by running `ceph osd set pglog_hardlimit`
  after completely upgrading to 12.2.11. Once the cluster has this flag
  set, the length of the pg log will be capped by a hard limit. Once 
set,

  this flag *must not* be unset anymore.

* There have been fixes to RGW dynamic and manual resharding, which no 
longer
  leaves behind stale bucket instances to be removed manually. For 
finding and
  cleaning up older instances from a reshard a radosgw-admin command 
`reshard
  stale-instances list` and `reshard stale-instances rm` should do the 
necessary

  cleanup.

For the complete changelog, please refer to the release blog entry at 
https://ceph.com/releases/v12-2-11-luminous-released/


Getting ceph:

* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-12.2.11.tar.gz
* For packages, see 
http://docs.ceph.com/docs/master/install/get-packages/

* Release git sha1: 26dc3775efc7bb286a1d6d66faee0ba30ea23eee

Best,
Abhishek
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Explanation of perf dump of rbd

2019-01-31 Thread Sinan Polat
Thanks for the clarification!

Great that the next release will include the feature. We are running on Red Hat 
Ceph, so we might have to wait longer before having the feature available.

Another related (simple) question:
We are using
/var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
in ceph.conf, can we include the volume name in the path?

Sinan

> Op 1 feb. 2019 om 00:44 heeft Jason Dillaman  het 
> volgende geschreven:
> 
>> On Thu, Jan 31, 2019 at 12:16 PM Paul Emmerich  
>> wrote:
>> 
>> "perf schema" has a description field that may or may not contain
>> additional information.
>> 
>> My best guess for these fields would be bytes read/written since
>> startup of this particular librbd instance. (Based on how these
>> counters usually work)
> 
> Correct -- they should be strictly increasing while the image is
> in-use. If you periodically scrape the values (along w/ the current
> timestamp), you can convert these values to the rates between the
> current and previous metrics.
> 
> On a semi-related subject: the forthcoming Nautilus release will
> include new "rbd perf image iotop" and "rbd perf image iostat"
> commands to monitor metrics by RBD image.
> 
>> Paul
>> 
>> --
>> Paul Emmerich
>> 
>> Looking for help with your Ceph cluster? Contact us at https://croit.io
>> 
>> croit GmbH
>> Freseniusstr. 31h
>> 81247 München
>> www.croit.io
>> Tel: +49 89 1896585 90
>> 
>>> On Thu, Jan 31, 2019 at 3:41 PM Sinan Polat  wrote:
>>> 
>>> Hi,
>>> 
>>> I finally figured out how to measure the statistics of a specific RBD 
>>> volume;
>>> 
>>> $ ceph --admin-daemon  perf dump
>>> 
>>> 
>>> It outputs a lot, but I don't know what it means, is there any 
>>> documentation about the output?
>>> 
>>> For now the most important values are:
>>> 
>>> - bytes read
>>> 
>>> - bytes written
>>> 
>>> 
>>> I think I need to look at this:
>>> 
>>> {
>>> "rd": 1043,
>>> "rd_bytes": 28242432,
>>> "rd_latency": {
>>> "avgcount": 1768,
>>> "sum": 2.375461133,
>>> "avgtime": 0.001343586
>>> },
>>> "wr": 76,
>>> "wr_bytes": 247808,
>>> "wr_latency": {
>>> "avgcount": 76,
>>> "sum": 0.970222300,
>>> "avgtime": 0.012766082
>>> }
>>> }
>>> 
>>> 
>>> But what is 28242432 (rd_bytes) and 247808 (wr_bytes). Is that 28242432 
>>> bytes read and 247808 bytes written during the last minute/hour/day? Or is 
>>> it since mounted, or...?
>>> 
>>> 
>>> Thanks!
>>> 
>>> 
>>> Sinan
>>> 
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> -- 
> Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question regarding client-network

2019-01-31 Thread Buchberger, Carsten
Thank you - we were expecting that, but wanted to be sure.
By the way - we are running our clusters on IPv6-BGP, to achieve massive 
scalability and load-balancing ;-)

Mit freundlichen Grüßen
Carsten Buchberger


[WITCOM_LOGO_CS4_CMYK_1.png]

WiTCOM Wiesbadener Informations-
und Telekommunikations GmbH

Carsten Buchberger
Technik
Netze & Systeme
___

fon +49 611-26244-211
fax +49 611-26244-262
c.buchber...@witcom.de
www.witcom.de

Konradinerallee 25
65189 Wiesbaden

Technische-Hotline:
08000-948266 (08000-WiTCOM)

HRB 10344 Wiesbaden
Steuernummer: 43 248 1943 6
Geschäftsführer: Dipl-Ing. Ralf Jung
Vorsitzender des Aufsichtsrates: Bürgermeister Dr. Oliver Franz

Ein Unternehmen der ESWE Versorgungs AG

[20y_witcom]


WiTCOM bringt alle Wiesbadener Gewerbegebiete ans Glasfasernetz! Haben Sie 
Fragen zum Ausbau? Dann rufen Sie uns an: 0611-26244-135
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph block - volume with RAID#0

2019-01-31 Thread M Ranga Swami Reddy
Here user requirement is - less write and more reads...so not much
worried on performance .

Thanks
Swami

On Thu, Jan 31, 2019 at 1:55 PM Piotr Dałek  wrote:
>
> On 2019-01-31 6:05 a.m., M Ranga Swami Reddy wrote:
> > My thought was - Ceph block volume with raid#0 (means I mounted a ceph
> > block volumes to an instance/VM, there I would like to configure this
> > volume with RAID0).
> >
> > Just to know, if anyone doing the same as above, if yes what are the
> > constraints?
>
> Exclusive lock on RBD images will kill any (theoretical) performance gains.
> Without exclusive lock, you lose some of RBD features.
>
> Plus, using 2+ clients with single images doesn't sound like a good idea.
>
> --
> Piotr Dałek
> piotr.da...@corp.ovh.com
> https://www.ovhcloud.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW multipart objects

2019-01-31 Thread Niels Maumenee
We have a public object storage cluster running Ceph rados gateway lumious
12.2.4, which we plan to update soon.
My question concerns some multipart object that appear to upload
successfully but when retrieving the object the client can only get 4MB.
An example would be
radosgw-admin object stat --bucket=vidstash
--object='fan/BeautifulLife.Supergirl.AceOfBase.elipie.mp4'
{
"name": "fan/BeautifulLife.Supergirl.AceOfBase.elipie.mp4",
"size": 30673927,
"policy": {
"acl": {
"acl_user_map": [
{
"user": "vidstash",
"acl": 15
}
],
"acl_group_map": [],
"grant_map": [
{
"id": "vidstash",
"grant": {
"type": {
"type": 0
},
"id": "vidstash",
"email": "",
"permission": {
"flags": 15
},
"name": "vidstash",
"group": 0,
"url_spec": ""
}
}
]
},
"owner": {
"id": "vidstash",
"display_name": "vidstash"
}
},
"etag": "b8bfe5bfea50250ecea84ee7db398b85",
"tag": "us-east-1-iad1.85989885.1098349",
"manifest": {
"objs": [],
"obj_size": 30673927,
"explicit_objs": "false",
"head_size": 4194304,
"max_head_size": 4194304,
"prefix": ".???w?\u007f_",
"rules": [
{
"key": 0,
"val": {
"start_part_num": 0,
"start_ofs": 4194304,
"part_size": 0,
"stripe_max_size": 4194304,
"override_prefix": ""
}
}
],
"tail_instance": "",
"tail_placement": {
"bucket": {
"name": "vidstash",
"marker": "us-east-1-iad1.35606604.10460",
"bucket_id": "us-east-1-iad1.35606604.10460",
"tenant": "",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
}
},
"placement_rule": "default-placement"
}
},
"attrs": {
"user.rgw.content_type": "video/mp4",
"user.rgw.pg_ver": "\u000d\u0017",
"user.rgw.source_zone": "",
"user.rgw.tail_tag": "us-east-1-iad1.85989885.1098349"
}
}

Two things jump out at me that I think indicate a problem with the object:

1. In the manifest section, I find:
"prefix": ".???w?\u007f_"
Which seems to be a bunch of characters and then the unicode character
DELETE (u007f) at then end.
instead of
"prefix":
"fan/BeautifulLife.Supergirl.AceOfBase.elipie.mp4.2~tvzPU7oBwFS5AiH_1ytDcbxUvBy-92A",
for a good object.

2. Also in the manifest section, most often I find that multipart objects
have more than one element in the rules array but that not always
indicative of a bad object.
"rules": [
{
"key": 0,
"val": {
"start_part_num": 0,
"start_ofs": 4194304,
"part_size": 0,
"stripe_max_size": 4194304,
"override_prefix": ""
}
}
],
instead of
"rules": [
{
"key": 0,
"val": {
"start_part_num": 1,
"start_ofs": 0,
"part_size": 15728640,
"stripe_max_size": 4194304,
"override_prefix": ""
}
},
{
"key": 15728640,
"val": {
"start_part_num": 2,
"start_ofs": 15728640,
"part_size": 14945287,
"stripe_max_size": 4194304,
"override_prefix": ""
}
}
],

The only way I have found to fix a bad object is to re-upload the object.
Does anyone have any idea what might be going on when these bad object
appear to be available but in reality they are not completely there?
And/or is there a better way to identify incomplete objects?
Finally, is there another way to fix these objects, besides uploading them
again?

Thanks!!
Niels
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Explanation of perf dump of rbd

2019-01-31 Thread Jason Dillaman
On Thu, Jan 31, 2019 at 12:16 PM Paul Emmerich  wrote:
>
> "perf schema" has a description field that may or may not contain
> additional information.
>
> My best guess for these fields would be bytes read/written since
> startup of this particular librbd instance. (Based on how these
> counters usually work)

Correct -- they should be strictly increasing while the image is
in-use. If you periodically scrape the values (along w/ the current
timestamp), you can convert these values to the rates between the
current and previous metrics.

On a semi-related subject: the forthcoming Nautilus release will
include new "rbd perf image iotop" and "rbd perf image iostat"
commands to monitor metrics by RBD image.

> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Thu, Jan 31, 2019 at 3:41 PM Sinan Polat  wrote:
> >
> > Hi,
> >
> > I finally figured out how to measure the statistics of a specific RBD 
> > volume;
> >
> > $ ceph --admin-daemon  perf dump
> >
> >
> > It outputs a lot, but I don't know what it means, is there any 
> > documentation about the output?
> >
> > For now the most important values are:
> >
> > - bytes read
> >
> > - bytes written
> >
> >
> > I think I need to look at this:
> >
> > {
> > "rd": 1043,
> > "rd_bytes": 28242432,
> > "rd_latency": {
> > "avgcount": 1768,
> > "sum": 2.375461133,
> > "avgtime": 0.001343586
> > },
> > "wr": 76,
> > "wr_bytes": 247808,
> > "wr_latency": {
> > "avgcount": 76,
> > "sum": 0.970222300,
> > "avgtime": 0.012766082
> > }
> > }
> >
> >
> > But what is 28242432 (rd_bytes) and 247808 (wr_bytes). Is that 28242432 
> > bytes read and 247808 bytes written during the last minute/hour/day? Or is 
> > it since mounted, or...?
> >
> >
> > Thanks!
> >
> >
> > Sinan
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] DockerSwarm and CephFS

2019-01-31 Thread Carlos Mogas da Silva

On 31/01/2019 18:51, Jacob DeGlopper wrote:

Hi Carlos - just a guess, but you might need your credentials from /etc/ceph on 
the host mounted inside the container.

     -- jacob


Hi Jacob!

That's not the case afaik. Docker daemon itself mounts the target, so it's still the host in here, and then bind mounts it to the container. 
It's not the container itself that mounts the target.


Thanks anyway ;)
Carlos Mogas da Silva
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-ansible - where to ask questions?

2019-01-31 Thread Martin Palma
Hi Will,

there is a dedicated mailing list for ceph-ansible:
http://lists.ceph.com/listinfo.cgi/ceph-ansible-ceph.com

Best,
Martin

On Thu, Jan 31, 2019 at 5:07 PM Will Dennis  wrote:
>
> Hi all,
>
>
>
> Trying to utilize the ‘ceph-ansible’ project 
> (https://github.com/ceph/ceph-ansible ) to deploy some Ceph servers in a 
> Vagrant testbed; hitting some issues with some of the plays – where is the 
> right (best) venue to ask questions about this?
>
>
>
> Thanks,
>
> Will
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cephalocon Barcelona 2019 CFP ends tomorrow!

2019-01-31 Thread Mike Perez
Hey everyone,

Just a last minute reminder if you're considering presenting at
Cephalocon Barcelona 2019, the CFP will be ending tomorrow.

Early bird ticket rate ends February 15.

https://ceph.com/cephalocon/barcelona-2019/

--
Mike Perez (thingee)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] DockerSwarm and CephFS

2019-01-31 Thread Jacob DeGlopper
Hi Carlos - just a guess, but you might need your credentials from 
/etc/ceph on the host mounted inside the container.


    -- jacob

Hey guys!

First post to the list and new Ceph user so I might say/ask some 
stupid stuff ;)


I've setup a Ceph Storage (and crashed it 2 days after), with 2 
ceph-mon, 2 ceph-ods (same host), 2 ceph-mgr and 1 ceph-mgs. 
Everything is up and running and works great.
Now I'm trying to integrate the CephFS functionality with my Docker 
Swarm (the rbd part is already working great). I can mount the CephFS 
on the docker host without any problem with a specific client created 
for the effect (client.dockerfs). It also works great if creating a 
volume with "docker volume create" and then use that volume on a 
container. With a stack (defined as docker-compose.yml), it simply 
doesn't mount the CephFS share, and the ceph-mon daemons log this kind 
of msgs:
2019-01-30 21:44:56.595 7fed6daf9700  0 cephx server client.dockerfs:  
unexpected key: req.key=cb19d6f224e3099 expected_key=aa096575fa04aa68
2019-01-30 21:45:02.295 7fed6daf9700  0 cephx server client.dockerfs:  
unexpected key: req.key=8a87e7949a095e50 expected_key=1c3fd3ad47398e0a
2019-01-30 21:45:13.711 7fed6daf9700  0 cephx server client.dockerfs:  
unexpected key: req.key=93933c29c40e9b05 expected_key=5b1a8d4f4f0e8dd1


While on the docker host trying to start the container shows this:
Jan 30 23:57:57 docker02 kernel: libceph: auth method 'x' error -1

This is the mount command I use on the docker host to mount the CephFS 
share:
mount -t ceph  ceph-mon:/znc tmp -o 
mds_namespace=dockerfs,name=dockerfs,secret=`ceph auth print-key 
client.dockerfs`


And this is the volume part of the docker-compose.yml file:
volumes:
    data:
    driver: n0r1skcom/docker-volume-cephfs
    driver_opts:
    name: dockerfs
    secret: # Same output as the command above produces
    path: /znc
    monitors: ceph-mon
    mds_namespace: dockerfs


I must be doing something wrong with this because it looks really 
simple to do but, somehow, it isn't working.


Can someone shed any light plz?

Thanks,
Carlos Mogas da Silva
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] DockerSwarm and CephFS

2019-01-31 Thread Carlos Mogas da Silva

Hey guys!

First post to the list and new Ceph user so I might say/ask some stupid stuff ;)

I've setup a Ceph Storage (and crashed it 2 days after), with 2 ceph-mon, 2 ceph-ods (same host), 2 ceph-mgr and 1 ceph-mgs. Everything is up 
and running and works great.
Now I'm trying to integrate the CephFS functionality with my Docker Swarm (the rbd part is already working great). I can mount the CephFS on the 
docker host without any problem with a specific client created for the effect (client.dockerfs). It also works great if creating a volume with 
"docker volume create" and then use that volume on a container. With a stack (defined as docker-compose.yml), it simply doesn't mount the CephFS 
share, and the ceph-mon daemons log this kind of msgs:

2019-01-30 21:44:56.595 7fed6daf9700  0 cephx server client.dockerfs:  
unexpected key: req.key=cb19d6f224e3099 expected_key=aa096575fa04aa68
2019-01-30 21:45:02.295 7fed6daf9700  0 cephx server client.dockerfs:  
unexpected key: req.key=8a87e7949a095e50 expected_key=1c3fd3ad47398e0a
2019-01-30 21:45:13.711 7fed6daf9700  0 cephx server client.dockerfs:  
unexpected key: req.key=93933c29c40e9b05 expected_key=5b1a8d4f4f0e8dd1

While on the docker host trying to start the container shows this:
Jan 30 23:57:57 docker02 kernel: libceph: auth method 'x' error -1

This is the mount command I use on the docker host to mount the CephFS share:
mount -t ceph  ceph-mon:/znc tmp -o 
mds_namespace=dockerfs,name=dockerfs,secret=`ceph auth print-key 
client.dockerfs`

And this is the volume part of the docker-compose.yml file:
volumes:
data:
driver: n0r1skcom/docker-volume-cephfs
driver_opts:
name: dockerfs
secret: # Same output as the command above produces
path: /znc
monitors: ceph-mon
mds_namespace: dockerfs


I must be doing something wrong with this because it looks really simple to do 
but, somehow, it isn't working.

Can someone shed any light plz?

Thanks,
Carlos Mogas da Silva
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Self serve / automated S3 key creation?

2019-01-31 Thread Jack
Hi,

There is an admin API for RGW :
http://docs.ceph.com/docs/master/radosgw/adminops/

You can check out rgwadmin¹ to see how to use it

Best regards,

[1] https://github.com/UMIACS/rgwadmin

On 01/31/2019 06:11 PM, shubjero wrote:
> Has anyone automated the ability to generate S3 keys for OpenStack users in
> Ceph? Right now we take in a users request manually (Hey we need an S3 API
> key for our OpenStack project 'X', can you help?). We as cloud/ceph admins
> just use radosgw-admin to create them an access/secret key pair for their
> specific OpenStack project and provide it to them manually. Was just
> wondering if there was a self-serve way to do that. Curious to hear what
> others have done in regards to this.
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Self serve / automated S3 key creation?

2019-01-31 Thread shubjero
Has anyone automated the ability to generate S3 keys for OpenStack users in
Ceph? Right now we take in a users request manually (Hey we need an S3 API
key for our OpenStack project 'X', can you help?). We as cloud/ceph admins
just use radosgw-admin to create them an access/secret key pair for their
specific OpenStack project and provide it to them manually. Was just
wondering if there was a self-serve way to do that. Curious to hear what
others have done in regards to this.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Explanation of perf dump of rbd

2019-01-31 Thread Paul Emmerich
"perf schema" has a description field that may or may not contain
additional information.

My best guess for these fields would be bytes read/written since
startup of this particular librbd instance. (Based on how these
counters usually work)

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Thu, Jan 31, 2019 at 3:41 PM Sinan Polat  wrote:
>
> Hi,
>
> I finally figured out how to measure the statistics of a specific RBD volume;
>
> $ ceph --admin-daemon  perf dump
>
>
> It outputs a lot, but I don't know what it means, is there any documentation 
> about the output?
>
> For now the most important values are:
>
> - bytes read
>
> - bytes written
>
>
> I think I need to look at this:
>
> {
> "rd": 1043,
> "rd_bytes": 28242432,
> "rd_latency": {
> "avgcount": 1768,
> "sum": 2.375461133,
> "avgtime": 0.001343586
> },
> "wr": 76,
> "wr_bytes": 247808,
> "wr_latency": {
> "avgcount": 76,
> "sum": 0.970222300,
> "avgtime": 0.012766082
> }
> }
>
>
> But what is 28242432 (rd_bytes) and 247808 (wr_bytes). Is that 28242432 bytes 
> read and 247808 bytes written during the last minute/hour/day? Or is it since 
> mounted, or...?
>
>
> Thanks!
>
>
> Sinan
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pgs inactive after setting a new crush rule (Re: backfill_toofull after adding new OSDs)

2019-01-31 Thread Jan Kasprzak
Jan Kasprzak wrote:
:   OKay, now I changed the crush rule also on a pool with
: the real data, and it seems all the client i/o on that pool has stopped.
: The recovery continues, but things like qemu I/O, "rbd ls", and so on
: are just stuck doing nothing.
: 
:   Can I unstuck it somehow (faster than waiting for all the recovery
: to finish)? Thanks.

I was able to briefly reduce the "1721 pgs inactive" number
by restarting the some of the original filestore OSDs, but after some time
the number increased back to 1721. Then the data recovery finished,
and 1721 PGs remained inactive (and, of course this pool I/O was stuck,
both qemu and "rbd ls").

So I have returned the original crush rule, the data started
to migrate back to the original OSDs, and the client I/O got unstuck
(even though the data relocation is still in progress).

Where can be the problem? It might be that I am hitting the limit
of number of PGs per OSD or something? I had 60 OSDs before, and want
to move it all to 20 new OSDs instead. The pool in question has 2048 PGs.

Thanks,

-Yenya
: 
: # ceph -s
:   cluster:
: id: ... my-uuid ...
: health: HEALTH_ERR
: 3308311/3803892 objects misplaced (86.972%)
: Reduced data availability: 1721 pgs inactive
: Degraded data redundancy: 85361/3803892 objects degraded 
(2.244%), 1
: 39 pgs degraded, 139 pgs undersized
: Degraded data redundancy (low space): 25 pgs backfill_toofull
: 
:   services:
: mon: 3 daemons, quorum mon1,mon2,mon3
: mgr: mon2(active), standbys: mon1, mon3
: osd: 80 osds: 80 up, 80 in; 1868 remapped pgs
: rgw: 1 daemon active
: 
:   data:
: pools:   13 pools, 5056 pgs
: objects: 1.27 M objects, 4.8 TiB
: usage:   15 TiB used, 208 TiB / 224 TiB avail
: pgs: 34.039% pgs not active
:  85361/3803892 objects degraded (2.244%)
:  3308311/3803892 objects misplaced (86.972%)
:  3188 active+clean
:  1582 activating+remapped
:  139  activating+undersized+degraded+remapped
:  93   active+remapped+backfill_wait
:  29   active+remapped+backfilling
:  25   active+remapped+backfill_wait+backfill_toofull
: 
:   io:
: recovery: 174 MiB/s, 43 objects/s
: 
: 
: -Yenya
: 
: 
: Jan Kasprzak wrote:
: : : - Original Message -
: : : From: "Caspar Smit" 
: : : To: "Jan Kasprzak" 
: : : Cc: "ceph-users" 
: : : Sent: Thursday, 31 January, 2019 15:43:07
: : : Subject: Re: [ceph-users] backfill_toofull after adding new OSDs
: : : 
: : : Hi Jan, 
: : : 
: : : You might be hitting the same issue as Wido here: 
: : : 
: : : [ https://www.spinics.net/lists/ceph-users/msg50603.html | 
https://www.spinics.net/lists/ceph-users/msg50603.html ] 
: : : 
: : : Kind regards, 
: : : Caspar 
: : : 
: : : Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak < [ 
mailto:k...@fi.muni.cz | k...@fi.muni.cz ] >: 
: : : 
: : : 
: : : Hello, ceph users, 
: : : 
: : : I see the following HEALTH_ERR during cluster rebalance: 
: : : 
: : : Degraded data redundancy (low space): 8 pgs backfill_toofull 
: : : 
: : : Detailed description: 
: : : I have upgraded my cluster to mimic and added 16 new bluestore OSDs 
: : : on 4 hosts. The hosts are in a separate region in my crush map, and crush 
: : : rules prevented data to be moved on the new OSDs. Now I want to move 
: : : all data to the new OSDs (and possibly decomission the old filestore 
OSDs). 
: : : I have created the following rule: 
: : : 
: : : # ceph osd crush rule create-replicated on-newhosts newhostsroot host 
: : : 
: : : after this, I am slowly moving the pools one-by-one to this new rule: 
: : : 
: : : # ceph osd pool set test-hdd-pool crush_rule on-newhosts 
: : : 
: : : When I do this, I get the above error. This is misleading, because 
: : : ceph osd df does not suggest the OSDs are getting full (the most full 
: : : OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR 
: : : disappears. Why am I getting this error? 
: : : 
: : : # ceph -s 
: : : cluster: 
: : : id: ...my UUID... 
: : : health: HEALTH_ERR 
: : : 1271/3803223 objects misplaced (0.033%) 
: : : Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 65 pgs 
degraded, 67 pgs undersized 
: : : Degraded data redundancy (low space): 8 pgs backfill_toofull 
: : : 
: : : services: 
: : : mon: 3 daemons, quorum mon1,mon2,mon3 
: : : mgr: mon2(active), standbys: mon1, mon3 
: : : osd: 80 osds: 80 up, 80 in; 90 remapped pgs 
: : : rgw: 1 daemon active 
: : : 
: : : data: 
: : : pools: 13 pools, 5056 pgs 
: : : objects: 1.27 M objects, 4.8 TiB 
: : : usage: 15 TiB used, 208 TiB / 224 TiB avail 
: : : pgs: 40124/3803223 objects degraded (1.055%) 
: : : 1271/3803223 objects misplaced (0.033%) 
: : : 4963 active+clean 
: : : 41 active+recovery_wait+undersized+degraded+remapped 
: : : 21 active+recovery_wait+undersized+degraded 
: : : 17 

Re: [ceph-users] Spec for Ceph Mon+Mgr?

2019-01-31 Thread Jesper Krogh


> : We're currently co-locating our mons with the head node of our Hadoop
> : installation. That may be giving us some problems, we dont know yet, but
> : thus I'm speculation about moving them to dedicated hardware.

Would it be ok to run them on kvm VM’s - of course not backed by ceph?

Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-ansible - where to ask questions? [EXT]

2019-01-31 Thread Matthew Vernon

Hi,

On 31/01/2019 16:06, Will Dennis wrote:


Trying to utilize the ‘ceph-ansible’ project 
(https://github.com/ceph/ceph-ansible)
to deploy some Ceph servers in a Vagrant testbed; hitting some issues 
with some of the plays – where is the right (best) venue to ask 
questions about this?


There's a list for ceph-ansible: ceph-ansi...@lists.ceph.com /
http://lists.ceph.com/listinfo.cgi/ceph-ansible-ceph.com

HTH,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] backfill_toofull after adding new OSDs

2019-01-31 Thread Jan Kasprzak
OKay, now I changed the crush rule also on a pool with
the real data, and it seems all the client i/o on that pool has stopped.
The recovery continues, but things like qemu I/O, "rbd ls", and so on
are just stuck doing nothing.

Can I unstuck it somehow (faster than waiting for all the recovery
to finish)? Thanks.

# ceph -s
  cluster:
id: ... my-uuid ...
health: HEALTH_ERR
3308311/3803892 objects misplaced (86.972%)
Reduced data availability: 1721 pgs inactive
Degraded data redundancy: 85361/3803892 objects degraded (2.244%), 1
39 pgs degraded, 139 pgs undersized
Degraded data redundancy (low space): 25 pgs backfill_toofull

  services:
mon: 3 daemons, quorum mon1,mon2,mon3
mgr: mon2(active), standbys: mon1, mon3
osd: 80 osds: 80 up, 80 in; 1868 remapped pgs
rgw: 1 daemon active

  data:
pools:   13 pools, 5056 pgs
objects: 1.27 M objects, 4.8 TiB
usage:   15 TiB used, 208 TiB / 224 TiB avail
pgs: 34.039% pgs not active
 85361/3803892 objects degraded (2.244%)
 3308311/3803892 objects misplaced (86.972%)
 3188 active+clean
 1582 activating+remapped
 139  activating+undersized+degraded+remapped
 93   active+remapped+backfill_wait
 29   active+remapped+backfilling
 25   active+remapped+backfill_wait+backfill_toofull

  io:
recovery: 174 MiB/s, 43 objects/s


-Yenya


Jan Kasprzak wrote:
: : - Original Message -
: : From: "Caspar Smit" 
: : To: "Jan Kasprzak" 
: : Cc: "ceph-users" 
: : Sent: Thursday, 31 January, 2019 15:43:07
: : Subject: Re: [ceph-users] backfill_toofull after adding new OSDs
: : 
: : Hi Jan, 
: : 
: : You might be hitting the same issue as Wido here: 
: : 
: : [ https://www.spinics.net/lists/ceph-users/msg50603.html | 
https://www.spinics.net/lists/ceph-users/msg50603.html ] 
: : 
: : Kind regards, 
: : Caspar 
: : 
: : Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak < [ mailto:k...@fi.muni.cz 
| k...@fi.muni.cz ] >: 
: : 
: : 
: : Hello, ceph users, 
: : 
: : I see the following HEALTH_ERR during cluster rebalance: 
: : 
: : Degraded data redundancy (low space): 8 pgs backfill_toofull 
: : 
: : Detailed description: 
: : I have upgraded my cluster to mimic and added 16 new bluestore OSDs 
: : on 4 hosts. The hosts are in a separate region in my crush map, and crush 
: : rules prevented data to be moved on the new OSDs. Now I want to move 
: : all data to the new OSDs (and possibly decomission the old filestore OSDs). 
: : I have created the following rule: 
: : 
: : # ceph osd crush rule create-replicated on-newhosts newhostsroot host 
: : 
: : after this, I am slowly moving the pools one-by-one to this new rule: 
: : 
: : # ceph osd pool set test-hdd-pool crush_rule on-newhosts 
: : 
: : When I do this, I get the above error. This is misleading, because 
: : ceph osd df does not suggest the OSDs are getting full (the most full 
: : OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR 
: : disappears. Why am I getting this error? 
: : 
: : # ceph -s 
: : cluster: 
: : id: ...my UUID... 
: : health: HEALTH_ERR 
: : 1271/3803223 objects misplaced (0.033%) 
: : Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 65 pgs 
degraded, 67 pgs undersized 
: : Degraded data redundancy (low space): 8 pgs backfill_toofull 
: : 
: : services: 
: : mon: 3 daemons, quorum mon1,mon2,mon3 
: : mgr: mon2(active), standbys: mon1, mon3 
: : osd: 80 osds: 80 up, 80 in; 90 remapped pgs 
: : rgw: 1 daemon active 
: : 
: : data: 
: : pools: 13 pools, 5056 pgs 
: : objects: 1.27 M objects, 4.8 TiB 
: : usage: 15 TiB used, 208 TiB / 224 TiB avail 
: : pgs: 40124/3803223 objects degraded (1.055%) 
: : 1271/3803223 objects misplaced (0.033%) 
: : 4963 active+clean 
: : 41 active+recovery_wait+undersized+degraded+remapped 
: : 21 active+recovery_wait+undersized+degraded 
: : 17 active+remapped+backfill_wait 
: : 5 active+remapped+backfill_wait+backfill_toofull 
: : 3 active+remapped+backfill_toofull 
: : 2 active+recovering+undersized+remapped 
: : 2 active+recovering+undersized+degraded+remapped 
: : 1 active+clean+remapped 
: : 1 active+recovering+undersized+degraded 
: : 
: : io: 
: : client: 6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr 
: : recovery: 2.0 MiB/s, 92 objects/s 
: : 
: : Thanks for any hint, 
: : 
: : -Yenya 
: : 
: : -- 
: : | Jan "Yenya" Kasprzak http://fi.muni.cz/ | fi.muni.cz ] - work 
| [ http://yenya.net/ | yenya.net ] - private}> | 
: : | [ http://www.fi.muni.cz/~kas/ | http://www.fi.muni.cz/~kas/ ] GPG: 
4096R/A45477D5 | 
: : This is the world we live in: the way to deal with computers is to google 
: : the symptoms, and hope that you don't have to watch a video. --P. Zaitcev 
: : ___ 
: : ceph-users mailing list 
: : [ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.com ] 
: : [ 

[ceph-users] ceph-ansible - where to ask questions?

2019-01-31 Thread Will Dennis
Hi all,

Trying to utilize the 'ceph-ansible' project 
(https://github.com/ceph/ceph-ansible ) to deploy some Ceph servers in a 
Vagrant testbed; hitting some issues with some of the plays - where is the 
right (best) venue to ask questions about this?

Thanks,
Will
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] backfill_toofull after adding new OSDs

2019-01-31 Thread Jan Kasprzak
Fyodor Ustinov wrote:
: Hi!
: 
: I saw the same several times when I added a new osd to the cluster. One-two 
pg in "backfill_toofull" state.
: 
: In all versions of mimic.

Yep. In my case it is not (only) after adding the new OSDs.
An hour or so ago my cluster reached the HEALTH_OK state, so I moved
another pool to the new hosts with "crush_rule on-newhosts". The result
was immediate backfill_toofull on two PGs for about five minutes,
and then it reached the HEALTH_OK again.

So the PGs are not stuck in that state forever, they are there
only during the data reshuffle.

13.2.4 on CentOS 7.

-Yenya

: 
: - Original Message -
: From: "Caspar Smit" 
: To: "Jan Kasprzak" 
: Cc: "ceph-users" 
: Sent: Thursday, 31 January, 2019 15:43:07
: Subject: Re: [ceph-users] backfill_toofull after adding new OSDs
: 
: Hi Jan, 
: 
: You might be hitting the same issue as Wido here: 
: 
: [ https://www.spinics.net/lists/ceph-users/msg50603.html | 
https://www.spinics.net/lists/ceph-users/msg50603.html ] 
: 
: Kind regards, 
: Caspar 
: 
: Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak < [ mailto:k...@fi.muni.cz | 
k...@fi.muni.cz ] >: 
: 
: 
: Hello, ceph users, 
: 
: I see the following HEALTH_ERR during cluster rebalance: 
: 
: Degraded data redundancy (low space): 8 pgs backfill_toofull 
: 
: Detailed description: 
: I have upgraded my cluster to mimic and added 16 new bluestore OSDs 
: on 4 hosts. The hosts are in a separate region in my crush map, and crush 
: rules prevented data to be moved on the new OSDs. Now I want to move 
: all data to the new OSDs (and possibly decomission the old filestore OSDs). 
: I have created the following rule: 
: 
: # ceph osd crush rule create-replicated on-newhosts newhostsroot host 
: 
: after this, I am slowly moving the pools one-by-one to this new rule: 
: 
: # ceph osd pool set test-hdd-pool crush_rule on-newhosts 
: 
: When I do this, I get the above error. This is misleading, because 
: ceph osd df does not suggest the OSDs are getting full (the most full 
: OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR 
: disappears. Why am I getting this error? 
: 
: # ceph -s 
: cluster: 
: id: ...my UUID... 
: health: HEALTH_ERR 
: 1271/3803223 objects misplaced (0.033%) 
: Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 65 pgs 
degraded, 67 pgs undersized 
: Degraded data redundancy (low space): 8 pgs backfill_toofull 
: 
: services: 
: mon: 3 daemons, quorum mon1,mon2,mon3 
: mgr: mon2(active), standbys: mon1, mon3 
: osd: 80 osds: 80 up, 80 in; 90 remapped pgs 
: rgw: 1 daemon active 
: 
: data: 
: pools: 13 pools, 5056 pgs 
: objects: 1.27 M objects, 4.8 TiB 
: usage: 15 TiB used, 208 TiB / 224 TiB avail 
: pgs: 40124/3803223 objects degraded (1.055%) 
: 1271/3803223 objects misplaced (0.033%) 
: 4963 active+clean 
: 41 active+recovery_wait+undersized+degraded+remapped 
: 21 active+recovery_wait+undersized+degraded 
: 17 active+remapped+backfill_wait 
: 5 active+remapped+backfill_wait+backfill_toofull 
: 3 active+remapped+backfill_toofull 
: 2 active+recovering+undersized+remapped 
: 2 active+recovering+undersized+degraded+remapped 
: 1 active+clean+remapped 
: 1 active+recovering+undersized+degraded 
: 
: io: 
: client: 6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr 
: recovery: 2.0 MiB/s, 92 objects/s 
: 
: Thanks for any hint, 
: 
: -Yenya 
: 
: -- 
: | Jan "Yenya" Kasprzak http://fi.muni.cz/ | fi.muni.cz ] - work | 
[ http://yenya.net/ | yenya.net ] - private}> | 
: | [ http://www.fi.muni.cz/~kas/ | http://www.fi.muni.cz/~kas/ ] GPG: 
4096R/A45477D5 | 
: This is the world we live in: the way to deal with computers is to google 
: the symptoms, and hope that you don't have to watch a video. --P. Zaitcev 
: ___ 
: ceph-users mailing list 
: [ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.com ] 
: [ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ] 
: 
: ___
: ceph-users mailing list
: ceph-users@lists.ceph.com
: http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
| Jan "Yenya" Kasprzak  |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
 This is the world we live in: the way to deal with computers is to google
 the symptoms, and hope that you don't have to watch a video. --P. Zaitcev
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] backfill_toofull after adding new OSDs

2019-01-31 Thread Fyodor Ustinov
Hi!

I saw the same several times when I added a new osd to the cluster. One-two pg 
in "backfill_toofull" state.

In all versions of mimic.

- Original Message -
From: "Caspar Smit" 
To: "Jan Kasprzak" 
Cc: "ceph-users" 
Sent: Thursday, 31 January, 2019 15:43:07
Subject: Re: [ceph-users] backfill_toofull after adding new OSDs

Hi Jan, 

You might be hitting the same issue as Wido here: 

[ https://www.spinics.net/lists/ceph-users/msg50603.html | 
https://www.spinics.net/lists/ceph-users/msg50603.html ] 

Kind regards, 
Caspar 

Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak < [ mailto:k...@fi.muni.cz | 
k...@fi.muni.cz ] >: 


Hello, ceph users, 

I see the following HEALTH_ERR during cluster rebalance: 

Degraded data redundancy (low space): 8 pgs backfill_toofull 

Detailed description: 
I have upgraded my cluster to mimic and added 16 new bluestore OSDs 
on 4 hosts. The hosts are in a separate region in my crush map, and crush 
rules prevented data to be moved on the new OSDs. Now I want to move 
all data to the new OSDs (and possibly decomission the old filestore OSDs). 
I have created the following rule: 

# ceph osd crush rule create-replicated on-newhosts newhostsroot host 

after this, I am slowly moving the pools one-by-one to this new rule: 

# ceph osd pool set test-hdd-pool crush_rule on-newhosts 

When I do this, I get the above error. This is misleading, because 
ceph osd df does not suggest the OSDs are getting full (the most full 
OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR 
disappears. Why am I getting this error? 

# ceph -s 
cluster: 
id: ...my UUID... 
health: HEALTH_ERR 
1271/3803223 objects misplaced (0.033%) 
Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 65 pgs 
degraded, 67 pgs undersized 
Degraded data redundancy (low space): 8 pgs backfill_toofull 

services: 
mon: 3 daemons, quorum mon1,mon2,mon3 
mgr: mon2(active), standbys: mon1, mon3 
osd: 80 osds: 80 up, 80 in; 90 remapped pgs 
rgw: 1 daemon active 

data: 
pools: 13 pools, 5056 pgs 
objects: 1.27 M objects, 4.8 TiB 
usage: 15 TiB used, 208 TiB / 224 TiB avail 
pgs: 40124/3803223 objects degraded (1.055%) 
1271/3803223 objects misplaced (0.033%) 
4963 active+clean 
41 active+recovery_wait+undersized+degraded+remapped 
21 active+recovery_wait+undersized+degraded 
17 active+remapped+backfill_wait 
5 active+remapped+backfill_wait+backfill_toofull 
3 active+remapped+backfill_toofull 
2 active+recovering+undersized+remapped 
2 active+recovering+undersized+degraded+remapped 
1 active+clean+remapped 
1 active+recovering+undersized+degraded 

io: 
client: 6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr 
recovery: 2.0 MiB/s, 92 objects/s 

Thanks for any hint, 

-Yenya 

-- 
| Jan "Yenya" Kasprzak http://fi.muni.cz/ | fi.muni.cz ] - work | [ 
http://yenya.net/ | yenya.net ] - private}> | 
| [ http://www.fi.muni.cz/~kas/ | http://www.fi.muni.cz/~kas/ ] GPG: 
4096R/A45477D5 | 
This is the world we live in: the way to deal with computers is to google 
the symptoms, and hope that you don't have to watch a video. --P. Zaitcev 
___ 
ceph-users mailing list 
[ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.com ] 
[ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ] 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Explanation of perf dump of rbd

2019-01-31 Thread Sinan Polat
Hi,


I finally figured out how to measure the statistics of a specific RBD volume;

$ ceph --admin-daemon  perf dump


It outputs a lot, but I don't know what it means, is there any documentation
about the output?

For now the most important values are:

- bytes read

- bytes written


I think I need to look at this:

{
"rd": 1043,
"rd_bytes": 28242432,
"rd_latency": {
"avgcount": 1768,
"sum": 2.375461133,
"avgtime": 0.001343586
},
"wr": 76,
"wr_bytes": 247808,
"wr_latency": {
"avgcount": 76,
"sum": 0.970222300,
"avgtime": 0.012766082
}
}


But what is 28242432 (rd_bytes) and 247808 (wr_bytes). Is that 28242432 bytes
read and 247808 bytes written during the last minute/hour/day? Or is it since
mounted, or...?


Thanks!


Sinan___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-31 Thread Ben Kerr
"...Dashboard is a dashboard so could not get health thru curl..."

If i didn't miss the question, IMHO "dashboard" does this job adequately:

curl -s -XGET :7000/health_data | jq -C ".health.status"

ceph version 12.2.10

Am Do., 31. Jan. 2019 um 11:02 Uhr schrieb PHARABOT Vincent <
vincent.phara...@3ds.com>:

> I tried to start on the Monitor node itself
> Yes Dashboard is enabled
>
> # ceph mgr services
> {
> "dashboard": "https://ip-10-8-36-16.internal:8443/;,
> "restful": "https://ip-10-8-36-16.internal:8003/;
> }
>
> # curl -k https://ip-10-8-36-16.eu-west-2.compute.internal:8443/api/health
> {"status": "404 Not Found", "version": "3.2.2", "detail": "The path
> '/api/health' was not found.", "traceback": "Traceback (most recent call
> last):\n File \"/usr/lib/python2.7/si
> te-packages/cherrypy/_cprequest.py\", line 656, in respond\n response.body
> = self.handler()\n File
> \"/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py\", line 188, in
> __call__\n self.body = self.oldhandler(*args, **kwargs)\n File
> \"/usr/lib/python2.7/site-packages/cherrypy/_cperror.py\", line 386, in
> __call__\n raise self\nNotFound: (404
> , \"The path '/api/health' was not found.\")\n"}
>
> # curl -k
> https://ip-10-8-36-16.eu-west-2.compute.internal:8443/api/health/minimal
> {"status": "404 Not Found", "version": "3.2.2", "detail": "The path
> '/api/health/minimal' was not found.", "traceback": "Traceback (most recent
> call last):\n File \"/usr/lib/pyth
> on2.7/site-packages/cherrypy/_cprequest.py\", line 656, in respond\n
> response.body = self.handler()\n File
> \"/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py\", line
> 188, in __call__\n self.body = self.oldhandler(*args, **kwargs)\n File
> \"/usr/lib/python2.7/site-packages/cherrypy/_cperror.py\", line 386, in
> __call__\n raise self\nNotFou
> nd: (404, \"The path '/api/health/minimal' was not found.\")\n"}
>
> Vincent
>
> -Message d'origine-
> De : Lenz Grimmer [mailto:lgrim...@suse.com]
> Envoyé : jeudi 31 janvier 2019 00:36
> À : PHARABOT Vincent ; ceph-users@lists.ceph.com
> Objet : RE: [ceph-users] Simple API to have cluster healthcheck ?
>
>
>
> Am 30. Januar 2019 19:33:14 MEZ schrieb PHARABOT Vincent <
> vincent.phara...@3ds.com>:
>
> >Thanks for the info
> >But, nope, on Mimic (13.2.4) /api/health ends in 404 (/api/health/full,
> >/api/health/minimal also...)
>
> On which node did you try to access the API? Did you enable the Dashboard
> module in Ceph manager?
>
> Lenz
>
> --
> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
> This email and any attachments are intended solely for the use of the
> individual or entity to whom it is addressed and may be confidential and/or
> privileged.
>
> If you are not one of the named recipients or have received this email in
> error,
>
> (i) you should not read, disclose, or copy it,
>
> (ii) please notify sender of your receipt by reply email and delete this
> email and all attachments,
>
> (iii) Dassault Systèmes does not accept or assume any liability or
> responsibility for any use of or reliance on this email.
>
>
> Please be informed that your personal data are processed according to our
> data privacy policy as described on our website. Should you have any
> questions related to personal data protection, please contact 3DS Data
> Protection Officer at 3ds.compliance-priv...@3ds.com 3ds.compliance-priv...@3ds.com>
>
>
> For other languages, go to https://www.3ds.com/terms/email-disclaimer
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] backfill_toofull after adding new OSDs

2019-01-31 Thread Caspar Smit
Hi Jan,

You might be hitting the same issue as Wido here:

https://www.spinics.net/lists/ceph-users/msg50603.html

Kind regards,
Caspar

Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak :

> Hello, ceph users,
>
> I see the following HEALTH_ERR during cluster rebalance:
>
> Degraded data redundancy (low space): 8 pgs backfill_toofull
>
> Detailed description:
> I have upgraded my cluster to mimic and added 16 new bluestore OSDs
> on 4 hosts. The hosts are in a separate region in my crush map, and crush
> rules prevented data to be moved on the new OSDs. Now I want to move
> all data to the new OSDs (and possibly decomission the old filestore OSDs).
> I have created the following rule:
>
> # ceph osd crush rule create-replicated on-newhosts newhostsroot host
>
> after this, I am slowly moving the pools one-by-one to this new rule:
>
> # ceph osd pool set test-hdd-pool crush_rule on-newhosts
>
> When I do this, I get the above error. This is misleading, because
> ceph osd df does not suggest the OSDs are getting full (the most full
> OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR
> disappears. Why am I getting this error?
>
> # ceph -s
>   cluster:
> id: ...my UUID...
> health: HEALTH_ERR
> 1271/3803223 objects misplaced (0.033%)
> Degraded data redundancy: 40124/3803223 objects degraded
> (1.055%), 65 pgs degraded, 67 pgs undersized
> Degraded data redundancy (low space): 8 pgs backfill_toofull
>
>   services:
> mon: 3 daemons, quorum mon1,mon2,mon3
> mgr: mon2(active), standbys: mon1, mon3
> osd: 80 osds: 80 up, 80 in; 90 remapped pgs
> rgw: 1 daemon active
>
>   data:
> pools:   13 pools, 5056 pgs
> objects: 1.27 M objects, 4.8 TiB
> usage:   15 TiB used, 208 TiB / 224 TiB avail
> pgs: 40124/3803223 objects degraded (1.055%)
>  1271/3803223 objects misplaced (0.033%)
>  4963 active+clean
>  41   active+recovery_wait+undersized+degraded+remapped
>  21   active+recovery_wait+undersized+degraded
>  17   active+remapped+backfill_wait
>  5active+remapped+backfill_wait+backfill_toofull
>  3active+remapped+backfill_toofull
>  2active+recovering+undersized+remapped
>  2active+recovering+undersized+degraded+remapped
>  1active+clean+remapped
>  1active+recovering+undersized+degraded
>
>   io:
> client:   6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr
> recovery: 2.0 MiB/s, 92 objects/s
>
> Thanks for any hint,
>
> -Yenya
>
> --
> | Jan "Yenya" Kasprzak 
> |
> | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5
> |
>  This is the world we live in: the way to deal with computers is to google
>  the symptoms, and hope that you don't have to watch a video. --P. Zaitcev
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] backfill_toofull after adding new OSDs

2019-01-31 Thread Jan Kasprzak
Hello, ceph users,

I see the following HEALTH_ERR during cluster rebalance:

Degraded data redundancy (low space): 8 pgs backfill_toofull

Detailed description:
I have upgraded my cluster to mimic and added 16 new bluestore OSDs
on 4 hosts. The hosts are in a separate region in my crush map, and crush
rules prevented data to be moved on the new OSDs. Now I want to move
all data to the new OSDs (and possibly decomission the old filestore OSDs).
I have created the following rule:

# ceph osd crush rule create-replicated on-newhosts newhostsroot host

after this, I am slowly moving the pools one-by-one to this new rule:

# ceph osd pool set test-hdd-pool crush_rule on-newhosts

When I do this, I get the above error. This is misleading, because
ceph osd df does not suggest the OSDs are getting full (the most full
OSD is about 41 % full). After rebalancing is done, the HEALTH_ERR
disappears. Why am I getting this error?

# ceph -s
  cluster:
id: ...my UUID...
health: HEALTH_ERR
1271/3803223 objects misplaced (0.033%)
Degraded data redundancy: 40124/3803223 objects degraded (1.055%), 
65 pgs degraded, 67 pgs undersized
Degraded data redundancy (low space): 8 pgs backfill_toofull
 
  services:
mon: 3 daemons, quorum mon1,mon2,mon3
mgr: mon2(active), standbys: mon1, mon3
osd: 80 osds: 80 up, 80 in; 90 remapped pgs
rgw: 1 daemon active
 
  data:
pools:   13 pools, 5056 pgs
objects: 1.27 M objects, 4.8 TiB
usage:   15 TiB used, 208 TiB / 224 TiB avail
pgs: 40124/3803223 objects degraded (1.055%)
 1271/3803223 objects misplaced (0.033%)
 4963 active+clean
 41   active+recovery_wait+undersized+degraded+remapped
 21   active+recovery_wait+undersized+degraded
 17   active+remapped+backfill_wait
 5active+remapped+backfill_wait+backfill_toofull
 3active+remapped+backfill_toofull
 2active+recovering+undersized+remapped
 2active+recovering+undersized+degraded+remapped
 1active+clean+remapped
 1active+recovering+undersized+degraded
 
  io:
client:   6.6 MiB/s rd, 2.7 MiB/s wr, 75 op/s rd, 89 op/s wr
recovery: 2.0 MiB/s, 92 objects/s
 
Thanks for any hint,

-Yenya

-- 
| Jan "Yenya" Kasprzak  |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
 This is the world we live in: the way to deal with computers is to google
 the symptoms, and hope that you don't have to watch a video. --P. Zaitcev
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster Status:HEALTH_ERR for Full OSD

2019-01-31 Thread Fabio - NS3 srl

Il 30/01/19 17:00, Paul Emmerich ha scritto:

Quick and dirty solution: take the full OSD down to issue the deletion
command ;)

Better solutions: temporarily incrase the full limit (ceph osd
set-full-ratio) or reduce the OSD's reweight (ceph osd reweight)


Paul



Many thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster Status:HEALTH_ERR for Full OSD

2019-01-31 Thread Fabio - NS3 srl

Il 30/01/19 17:04, Amit Ghadge ha scritto:
Better way is increase osd set-full-ratio slightly (.97) and then 
remove buckets.





Many thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-31 Thread PHARABOT Vincent
I tried to start on the Monitor node itself
Yes Dashboard is enabled

# ceph mgr services
{
"dashboard": "https://ip-10-8-36-16.internal:8443/;,
"restful": "https://ip-10-8-36-16.internal:8003/;
}

# curl -k https://ip-10-8-36-16.eu-west-2.compute.internal:8443/api/health
{"status": "404 Not Found", "version": "3.2.2", "detail": "The path 
'/api/health' was not found.", "traceback": "Traceback (most recent call 
last):\n File \"/usr/lib/python2.7/si
te-packages/cherrypy/_cprequest.py\", line 656, in respond\n response.body = 
self.handler()\n File 
\"/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py\", line 188, in
__call__\n self.body = self.oldhandler(*args, **kwargs)\n File 
\"/usr/lib/python2.7/site-packages/cherrypy/_cperror.py\", line 386, in 
__call__\n raise self\nNotFound: (404
, \"The path '/api/health' was not found.\")\n"}

# curl -k 
https://ip-10-8-36-16.eu-west-2.compute.internal:8443/api/health/minimal
{"status": "404 Not Found", "version": "3.2.2", "detail": "The path 
'/api/health/minimal' was not found.", "traceback": "Traceback (most recent 
call last):\n File \"/usr/lib/pyth
on2.7/site-packages/cherrypy/_cprequest.py\", line 656, in respond\n 
response.body = self.handler()\n File 
\"/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py\", line
188, in __call__\n self.body = self.oldhandler(*args, **kwargs)\n File 
\"/usr/lib/python2.7/site-packages/cherrypy/_cperror.py\", line 386, in 
__call__\n raise self\nNotFou
nd: (404, \"The path '/api/health/minimal' was not found.\")\n"}

Vincent

-Message d'origine-
De : Lenz Grimmer [mailto:lgrim...@suse.com]
Envoyé : jeudi 31 janvier 2019 00:36
À : PHARABOT Vincent ; ceph-users@lists.ceph.com
Objet : RE: [ceph-users] Simple API to have cluster healthcheck ?



Am 30. Januar 2019 19:33:14 MEZ schrieb PHARABOT Vincent 
:

>Thanks for the info
>But, nope, on Mimic (13.2.4) /api/health ends in 404 (/api/health/full,
>/api/health/minimal also...)

On which node did you try to access the API? Did you enable the Dashboard 
module in Ceph manager?

Lenz

--
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systèmes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.


Please be informed that your personal data are processed according to our data 
privacy policy as described on our website. Should you have any questions 
related to personal data protection, please contact 3DS Data Protection Officer 
at 3ds.compliance-priv...@3ds.com


For other languages, go to https://www.3ds.com/terms/email-disclaimer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph block - volume with RAID#0

2019-01-31 Thread Piotr Dałek

On 2019-01-31 6:05 a.m., M Ranga Swami Reddy wrote:

My thought was - Ceph block volume with raid#0 (means I mounted a ceph
block volumes to an instance/VM, there I would like to configure this
volume with RAID0).

Just to know, if anyone doing the same as above, if yes what are the
constraints?


Exclusive lock on RBD images will kill any (theoretical) performance gains. 
Without exclusive lock, you lose some of RBD features.


Plus, using 2+ clients with single images doesn't sound like a good idea.

--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovhcloud.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com