Re: [ceph-users] 1 osd Segmentation fault in test cluster

2017-10-03 Thread Brad Hubbard
Looks like there is one already.

http://tracker.ceph.com/issues/21259

On Tue, Oct 3, 2017 at 1:15 AM, Gregory Farnum  wrote:
> Please file a tracker ticket with all the info you have for stuff like this.
> They’re a lot harder to lose than emails are. ;)
>
> On Sat, Sep 30, 2017 at 8:31 AM Marc Roos  wrote:
>>
>> Is this useful for someone?
>>
>>
>>
>> [Sat Sep 30 15:51:11 2017] libceph: osd5 192.168.10.113:6809 socket
>> closed (con state OPEN)
>> [Sat Sep 30 15:51:11 2017] libceph: osd5 192.168.10.113:6809 socket
>> closed (con state CONNECTING)
>> [Sat Sep 30 15:51:11 2017] libceph: osd5 down
>> [Sat Sep 30 15:51:11 2017] libceph: osd5 down
>> [Sat Sep 30 15:52:52 2017] libceph: osd5 up
>> [Sat Sep 30 15:52:52 2017] libceph: osd5 up
>>
>>
>>
>> 2017-09-30 15:48:08.542202 7f7623ce9700  0 log_channel(cluster) log
>> [WRN] : slow request 31.456482 seconds old, received at 2017-09-30
>> 15:47:37.085589: osd_op(mds.0.9227:1289186 20.2b 20.9af42b6b (undecoded)
>> ondisk+write+known_if_redirected+full_force e15675) currently
>> queued_for_pg
>> 2017-09-30 15:48:08.542207 7f7623ce9700  0 log_channel(cluster) log
>> [WRN] : slow request 31.456086 seconds old, received at 2017-09-30
>> 15:47:37.085984: osd_op(mds.0.9227:1289190 20.13 20.e44f3f53 (undecoded)
>> ondisk+write+known_if_redirected+full_force e15675) currently
>> queued_for_pg
>> 2017-09-30 15:48:08.542212 7f7623ce9700  0 log_channel(cluster) log
>> [WRN] : slow request 31.456005 seconds old, received at 2017-09-30
>> 15:47:37.086065: osd_op(mds.0.9227:1289194 20.2b 20.6733bdeb (undecoded)
>> ondisk+write+known_if_redirected+full_force e15675) currently
>> queued_for_pg
>> 2017-09-30 15:51:12.592490 7f7611cc5700  0 log_channel(cluster) log
>> [DBG] : 20.3f scrub starts
>> 2017-09-30 15:51:24.514602 7f76214e4700 -1 *** Caught signal
>> (Segmentation fault) **
>>  in thread 7f76214e4700 thread_name:bstore_mempool
>>
>>  ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous
>> (stable)
>>  1: (()+0xa29511) [0x7f762e5b2511]
>>  2: (()+0xf370) [0x7f762afa5370]
>>  3: (BlueStore::TwoQCache::_trim(unsigned long, unsigned long)+0x2df)
>> [0x7f762e481a2f]
>>  4: (BlueStore::Cache::trim(unsigned long, float, float, float)+0x1d1)
>> [0x7f762e4543e1]
>>  5: (BlueStore::MempoolThread::entry()+0x14d) [0x7f762e45a71d]
>>  6: (()+0x7dc5) [0x7f762af9ddc5]
>>  7: (clone()+0x6d) [0x7f762a09176d]
>>  NOTE: a copy of the executable, or `objdump -rdS ` is
>> needed to interpret this.
>>
>> --- begin dump of recent events ---
>> -1> 2017-09-30 15:51:05.105915 7f76284ac700  5 --
>> 192.168.10.113:0/27661 >> 192.168.10.111:6810/6617 conn(0x7f766b736000
>> :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=19 cs=1 l=1). rx
>> osd.0 seq 19546 0x7f76a2daf000 osd_ping(ping_reply e15675 stamp
>> 2017-09-30 15:51:05.105439) v4
>>  -> 2017-09-30 15:51:05.105963 7f760fcc1700  1 -- 10.0.0.13:0/27661
>> --> 10.0.0.11:6805/6491 -- osd_ping(ping e15675 stamp 2017-09-30
>> 15:51:05.105439) v4 -- 0x7f7683e98a00 con 0
>>  -9998> 2017-09-30 15:51:05.105960 7f76284ac700  1 --
>> 192.168.10.113:0/27661 <== osd.0 192.168.10.111:6810/6617 19546 
>> osd_ping(ping_reply e15675 stamp 2017-09-30 15:51:05.105439) v4 
>> 2004+0+0 (1212154800 0 0) 0x7f76a2daf000 con 0x7f766b736000
>>  -9997> 2017-09-30 15:51:05.105961 7f76274aa700  5 -- 10.0.0.13:0/27661
>> >> 10.0.0.11:6808/6646 conn(0x7f766b745800 :-1
>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=24 cs=1 l=1). rx osd.3
>> seq 19546 0x7f769b95f200 osd_ping(ping_reply e15675 stamp 2017-09-30
>> 15:51:05.105439) v4
>>  -9996> 2017-09-30 15:51:05.105983 7f760fcc1700  1 --
>> 192.168.10.113:0/27661 --> 192.168.10.111:6805/6491 -- osd_ping(ping
>> e15675 stamp 2017-09-30 15:51:05.105439) v4 -- 0x7f7683e97600 con 0
>>  -9995> 2017-09-30 15:51:05.106001 7f76274aa700  1 -- 10.0.0.13:0/27661
>> <== osd.3 10.0.0.11:6808/6646 19546  osd_ping(ping_reply e15675
>> stamp 2017-09-30 15:51:05.105439) v4  2004+0+0 (1212154800 0 0)
>> 0x7f769b95f200 con 0x7f766b745800
>>  -9994> 2017-09-30 15:51:05.106015 7f760fcc1700  1 -- 10.0.0.13:0/27661
>> --> 10.0.0.11:6807/6470 -- osd_ping(ping e15675 stamp 2017-09-30
>> 15:51:05.105439) v4 -- 0x7f7683e99800 con 0
>>  -9993> 2017-09-30 15:51:05.106035 7f760fcc1700  1 --
>> 192.168.10.113:0/27661 --> 192.168.10.111:6808/6470 -- osd_ping(ping
>> e15675 stamp 2017-09-30 15:51:05.105439) v4 -- 0x7f763b72a200 con 0
>>  -9992> 2017-09-30 15:51:05.106072 7f760fcc1700  1 -- 10.0.0.13:0/27661
>> --> 10.0.0.11:6809/6710 -- osd_ping(ping e15675 stamp 2017-09-30
>> 15:51:05.105439) v4 -- 0x7f768633dc00 con 0
>>  -9991> 2017-09-30 15:51:05.106093 7f760fcc1700  1 --
>> 192.168.10.113:0/27661 --> 192.168.10.111:6804/6710 -- osd_ping(ping
>> e15675 stamp 2017-09-30 15:51:05.105439) v4 -- 0x7f76667d3600 con 0
>>  -9990> 2017-09-30 15:51:05.106114 7f760fcc1700  1 -- 10.0.0.13:0/27661
>> --> 10.0.0.12:6805/1949 -- osd_ping(ping e15675 stamp 

Re: [ceph-users] How to use rados_aio_write correctly?

2017-10-03 Thread Gregory Farnum
On Tue, Oct 3, 2017 at 3:15 AM Alexander Kushnirenko 
wrote:

> Hello,
>
> I'm working on third party code (Bareos Storage daemon) which gives very
> low write speeds for CEPH.  The code was written to demonstrate that it is
> possible, but the speed is about 3-9 MB/s which is too slow.   I modified
> the routine to use rados_aio_write instead of rados_write, and was able to
> backup/restore data successfully with the speed about 30MB/s, which what I
> would expect on 1GB/s network and rados bench results.  I studied examples
> in the documents and github, but still I'm afraid that by code is working
> merely by accident.  Could some one comment on the following questions:
>
> Q1. Storage daemon sends write requests of 64K size, so current code works
> like this:
>
> rados_write(., buffer, len=64K, offset=0)
> rados_write(., buffer, len=64K, offset=64K)
> rados_write(., buffer, len=64K, offset=128K)
> ... and so on ...
>
> What is the correct way to use AIO (to use one completion or several?)
> Version 1:
>
> rados_aio_create_completion(NULL, NULL, NULL, );
> rados_aio_write(., comp, buffer, len=64K, offset=0)
> rados_aio_write(., comp, buffer, len=64K, offset=64K)
> rados_aio_write(., comp, buffer, len=64K, offset=128K)
> rados_aio_wait_for_complete(comp);// wait for Async IO in memory
> rados_aio_wait_for_safe(comp);// and on disk
> rados_aio_release(comp);
>
> Version 2:
> rados_aio_create_completion(NULL, NULL, NULL, );
> rados_aio_create_completion(NULL, NULL, NULL, );
> rados_aio_create_completion(NULL, NULL, NULL, );
> rados_aio_write(., comp1, buffer, len=64K, offset=0)
> rados_aio_write(., comp2, buffer, len=64K, offset=64K)
> rados_aio_write(., comp3, buffer, len=64K, offset=128K)
> rados_aio_wait_for_complete(comp1);
> rados_aio_wait_for_complete(comp2);
> rados_aio_wait_for_complete(comp3);
> rados_aio_write(., comp1, buffer, len=64K, offset=192K)
> rados_aio_write(., comp2, buffer, len=64K, offset=256K)
> rados_aio_write(., comp3, buffer, len=64K, offset=320K)
> .
>

Each operation needs its own completion. If you give them the same one,
things will go very badly.


>
> Q2.  Problem of maximum object size.  When I use rados_write I get an
> error when I exceed maximum object size (132MB in luminous).  But when I
> use rados_aio_write it happily goes beyond the limit of object, but
> actually writes nothing, but does not make any error.  Is there a way to
> catch such situation?
>
>
Each completion gets a response code included. You are apparently not
actually looking at them; you should do so!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw notify on creation/deletion of file in bucket

2017-10-03 Thread Yehuda Sadeh-Weinraub
On Tue, Oct 3, 2017 at 8:59 AM, Sean Purdy  wrote:
> Hi,
>
>
> Is there any way that radosgw can ping something when a file is removed or 
> added to a bucket?
>

That depends on what exactly you're looking for. You can't get that
info as a user. but there is a mechanism for remote zones to detect
changes that happen on the zone.

> Or use its sync facility to sync files to AWS/Google buckets?
>

Not at the moment, in the works. Unless you want to write your own sync plugin.

Yehuda

> Just thinking about backups.  What do people use for backups?  Been looking 
> at rclone.
>
>
> Thanks,
>
> Sean
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph stuck creating pool

2017-10-03 Thread Guilherme Lima
Hi David;



Yes I can ping the host from the cluster network.

This is a test lab build in Hyper-V.

I think you are right, probably there is a problem with the cluster network.

I will check and let you know the results.



Thanks very much



Guilherme Lima

Systems Administrator



Main: +351 220 430 530

Fax: +351 253 424 739

Skype: guilherme.lima.farfetch.com



Farfetch

Rua da Lionesa, nr. 446

Edificio G12

4465-671 Leça do Balio

Porto – Portugal



[image:
http://cdn-static.farfetch.com/Content/UP/email_signature/fflogox.jpg]

400 Boutiques. 1 Address



http://farfetch.com

Twitter: https://twitter.com/farfetch

Facebook: https://www.facebook.com/Farfetch

Instagram: https://instagram.com/farfetch



This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you are not the named addressee/intended recipient then please delete it
and notify the sender immediately.



*From:* David Turner [mailto:drakonst...@gmail.com]
*Sent:* Tuesday, October 3, 2017 17:53
*To:* Guilherme Lima ; Webert de Souza Lima <
webert.b...@gmail.com>
*Cc:* ceph-users 
*Subject:* Re: [ceph-users] Ceph stuck creating pool



My guess is a networking problem.  Do you have vlans, cluster network vs
public network in the ceph.conf, etc configured?  Can you ping between all
of your storage nodes on all of their IPs?



All of your OSDs communicate with the mons on the public network, but they
communicate with each other for peering on the cluster network.  My guess
is that your public network is working fine, but that your cluster network
might be having an issue causing the new PGs to never be able to peer.



On Tue, Oct 3, 2017 at 11:12 AM Guilherme Lima 
wrote:

Here it is,



size: 3

min_size: 2

crush_rule: replicated_rule



[

{

"rule_id": 0,

"rule_name": "replicated_rule",

"ruleset": 0,

"type": 1,

"min_size": 1,

"max_size": 10,

"steps": [

{

"op": "take",

"item": -1,

"item_name": "default"

},

{

"op": "chooseleaf_firstn",

"num": 0,

"type": "host"

},

{

"op": "emit"

}

]

}

]





Thanks

Guilherme





*From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of
*Webert de Souza Lima
*Sent:* Tuesday, October 3, 2017 15:47
*To:* ceph-users 
*Subject:* Re: [ceph-users] Ceph stuck creating pool



This looks like something wrong with the crush rule.



What's the size, min_size and crush_rule of this pool?

 ceph osd pool get POOLNAME size

 ceph osd pool get POOLNAME min_size

 ceph osd pool get POOLNAME crush_ruleset



How is the crush rule?

 ceph osd crush rule dump




Regards,



Webert Lima

DevOps Engineer at MAV Tecnologia

*Belo Horizonte - Brasil*



On Tue, Oct 3, 2017 at 11:22 AM, Guilherme Lima 
wrote:

Hi,



I have installed a virtual Ceph Cluster lab. I using Ceph Luminous v12.2.1

It consist in 3 mon + 3 osd nodes.

Each node have 3 x 250GB OSD.



My osd tree:



ID CLASS WEIGHT  TYPE NAME  STATUS REWEIGHT PRI-AFF

-1   2.19589 root default

-3   0.73196 host osd1

0   hdd 0.24399 osd.0  up  1.0 1.0

6   hdd 0.24399 osd.6  up  1.0 1.0

9   hdd 0.24399 osd.9  up  1.0 1.0

-5   0.73196 host osd2

1   hdd 0.24399 osd.1  up  1.0 1.0

7   hdd 0.24399 osd.7  up  1.0 1.0

10   hdd 0.24399 osd.10 up  1.0 1.0

-7   0.73196 host osd3

2   hdd 0.24399 osd.2  up  1.0 1.0

8   hdd 0.24399 osd.8  up  1.0 1.0

11   hdd 0.24399 osd.11 up  1.0 1.0



After create a new pool it is stuck on creating+peering and
creating+activating.



  cluster:

id: d20fdc12-f8bf-45c1-a276-c36dfcc788bc

health: HEALTH_WARN

Reduced data availability: 256 pgs inactive, 143 pgs peering

Degraded data redundancy: 256 pgs unclean



  services:

mon: 3 daemons, quorum mon2,mon3,mon1

mgr: mon1(active), standbys: mon2, mon3

osd: 9 osds: 9 up, 9 in



  data:

pools:   1 pools, 256 pgs

objects: 0 objects, 0 bytes

usage:   10202 MB used, 2239 GB / 2249 GB avail

pgs: 100.000% pgs not active

 143 creating+peering

 113 creating+activating



Can anyone help to find the issue?



Thanks

Guilherme











This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system 

Re: [ceph-users] radosgw notify on creation/deletion of file in bucket

2017-10-03 Thread David Turner
Just to make sure you're not confusing redundancy with backups.  Having
your data in another site does not back up your data, but makes it more
redundant.  For instance if an object/file is accidentally deleted from RGW
and you're syncing those files to AWS, Google buckets, or a second RGW
cluster in another datacenter... the file is still deleted on the second
site and you can't use the second site to restore the file.

If you're looking for a second site for redundancy in case the first site
goes down, I personally use a second ceph cluster in another datacenter
using RGW multisite.  That way I can easily change my public LBs to point
to a single datacenter while performing upgrades or testing settings.

There is the ability to snapshot pools and objects for backups, but make
sure you read up on that before jumping on the band-wagon to make sure that
your configuration and use case won't feel the pain points of snapshots.

On Tue, Oct 3, 2017 at 12:00 PM Sean Purdy  wrote:

> Hi,
>
>
> Is there any way that radosgw can ping something when a file is removed or
> added to a bucket?
>
> Or use its sync facility to sync files to AWS/Google buckets?
>
> Just thinking about backups.  What do people use for backups?  Been
> looking at rclone.
>
>
> Thanks,
>
> Sean
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph stuck creating pool

2017-10-03 Thread David Turner
My guess is a networking problem.  Do you have vlans, cluster network vs
public network in the ceph.conf, etc configured?  Can you ping between all
of your storage nodes on all of their IPs?

All of your OSDs communicate with the mons on the public network, but they
communicate with each other for peering on the cluster network.  My guess
is that your public network is working fine, but that your cluster network
might be having an issue causing the new PGs to never be able to peer.

On Tue, Oct 3, 2017 at 11:12 AM Guilherme Lima 
wrote:

> Here it is,
>
>
>
> size: 3
>
> min_size: 2
>
> crush_rule: replicated_rule
>
>
>
> [
>
> {
>
> "rule_id": 0,
>
> "rule_name": "replicated_rule",
>
> "ruleset": 0,
>
> "type": 1,
>
> "min_size": 1,
>
> "max_size": 10,
>
> "steps": [
>
> {
>
> "op": "take",
>
> "item": -1,
>
> "item_name": "default"
>
> },
>
> {
>
> "op": "chooseleaf_firstn",
>
> "num": 0,
>
> "type": "host"
>
> },
>
> {
>
> "op": "emit"
>
> }
>
> ]
>
> }
>
> ]
>
>
>
>
>
> Thanks
>
> Guilherme
>
>
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Webert de Souza Lima
> *Sent:* Tuesday, October 3, 2017 15:47
> *To:* ceph-users 
> *Subject:* Re: [ceph-users] Ceph stuck creating pool
>
>
>
> This looks like something wrong with the crush rule.
>
>
>
> What's the size, min_size and crush_rule of this pool?
>
>  ceph osd pool get POOLNAME size
>
>  ceph osd pool get POOLNAME min_size
>
>  ceph osd pool get POOLNAME crush_ruleset
>
>
>
> How is the crush rule?
>
>  ceph osd crush rule dump
>
>
>
>
> Regards,
>
>
>
> Webert Lima
>
> DevOps Engineer at MAV Tecnologia
>
> *Belo Horizonte - Brasil*
>
>
>
> On Tue, Oct 3, 2017 at 11:22 AM, Guilherme Lima <
> guilherme.l...@farfetch.com> wrote:
>
> Hi,
>
>
>
> I have installed a virtual Ceph Cluster lab. I using Ceph Luminous v12.2.1
>
> It consist in 3 mon + 3 osd nodes.
>
> Each node have 3 x 250GB OSD.
>
>
>
> My osd tree:
>
>
>
> ID CLASS WEIGHT  TYPE NAME  STATUS REWEIGHT PRI-AFF
>
> -1   2.19589 root default
>
> -3   0.73196 host osd1
>
> 0   hdd 0.24399 osd.0  up  1.0 1.0
>
> 6   hdd 0.24399 osd.6  up  1.0 1.0
>
> 9   hdd 0.24399 osd.9  up  1.0 1.0
>
> -5   0.73196 host osd2
>
> 1   hdd 0.24399 osd.1  up  1.0 1.0
>
> 7   hdd 0.24399 osd.7  up  1.0 1.0
>
> 10   hdd 0.24399 osd.10 up  1.0 1.0
>
> -7   0.73196 host osd3
>
> 2   hdd 0.24399 osd.2  up  1.0 1.0
>
> 8   hdd 0.24399 osd.8  up  1.0 1.0
>
> 11   hdd 0.24399 osd.11 up  1.0 1.0
>
>
>
> After create a new pool it is stuck on creating+peering and
> creating+activating.
>
>
>
>   cluster:
>
> id: d20fdc12-f8bf-45c1-a276-c36dfcc788bc
>
> health: HEALTH_WARN
>
> Reduced data availability: 256 pgs inactive, 143 pgs peering
>
> Degraded data redundancy: 256 pgs unclean
>
>
>
>   services:
>
> mon: 3 daemons, quorum mon2,mon3,mon1
>
> mgr: mon1(active), standbys: mon2, mon3
>
> osd: 9 osds: 9 up, 9 in
>
>
>
>   data:
>
> pools:   1 pools, 256 pgs
>
> objects: 0 objects, 0 bytes
>
> usage:   10202 MB used, 2239 GB / 2249 GB avail
>
> pgs: 100.000% pgs not active
>
>  143 creating+peering
>
>  113 creating+activating
>
>
>
> Can anyone help to find the issue?
>
>
>
> Thanks
>
> Guilherme
>
>
>
>
>
>
>
>
>
>
>
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you have received this email in error please notify the system manager.
> This message contains confidential information and is intended only for the
> individual named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. If you are not the intended recipient
> you are notified that disclosing, copying, distributing or taking any
> action in reliance on the contents of this information is strictly
> prohibited.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you have received this email in error please notify the system manager.
> This message contains 

[ceph-users] radosgw notify on creation/deletion of file in bucket

2017-10-03 Thread Sean Purdy
Hi,


Is there any way that radosgw can ping something when a file is removed or 
added to a bucket?

Or use its sync facility to sync files to AWS/Google buckets?

Just thinking about backups.  What do people use for backups?  Been looking at 
rclone.


Thanks,

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] tunable question

2017-10-03 Thread lists

Thanks Jake, for your extensive reply. :-)

MJ

On 3-10-2017 15:21, Jake Young wrote:


On Tue, Oct 3, 2017 at 8:38 AM lists > wrote:


Hi,

What would make the decision easier: if we knew that we could easily
revert the
  > "ceph osd crush tunables optimal"
once it has begun rebalancing data?

Meaning: if we notice that impact is too high, or it will take too long,
that we could simply again say
  > "ceph osd crush tunables hammer"
and the cluster would calm down again?


Yes you can revert the tunables back; but it will then move all the data 
back where it was, so be prepared for that.


Verify you have the following values in ceph.conf. Note that these are 
the defaults in Jewel, so if they aren’t defined, you’re probably good:

osd_max_backfills=1
osd_recovery_threads=1

You can try to set these (using ceph —inject) if you notice a large 
impact to your client performance:

osd_recovery_op_priority=1
osd_recovery_max_active=1
osd_recovery_threads=1

I recall this tunables change when we went from hammer to jewel last 
year. It took over 24 hours to rebalance 122TB on our 110 osd  cluster.


Jake



MJ

On 2-10-2017 9:41, Manuel Lausch wrote:
 > Hi,
 >
 > We have similar issues.
 > After upgradeing from hammer to jewel the tunable "choose leave
stabel"
 > was introduces. If we activate it nearly all data will be moved. The
 > cluster has 2400 OSD on 40 nodes over two datacenters and is
filled with
 > 2,5 PB Data.
 >
 > We tried to enable it but the backfillingtraffic is to high to be
 > handled without impacting other services on the Network.
 >
 > Do someone know if it is neccessary to enable this tunable? And could
 > it be a problem in the future if we want to upgrade to newer versions
 > wihout it enabled?
 >
 > Regards,
 > Manuel Lausch
 >
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph stuck creating pool

2017-10-03 Thread Guilherme Lima
Here it is,



size: 3

min_size: 2

crush_rule: replicated_rule



[

{

"rule_id": 0,

"rule_name": "replicated_rule",

"ruleset": 0,

"type": 1,

"min_size": 1,

"max_size": 10,

"steps": [

{

"op": "take",

"item": -1,

"item_name": "default"

},

{

"op": "chooseleaf_firstn",

"num": 0,

"type": "host"

},

{

"op": "emit"

}

]

}

]





Thanks

Guilherme





*From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of
*Webert de Souza Lima
*Sent:* Tuesday, October 3, 2017 15:47
*To:* ceph-users 
*Subject:* Re: [ceph-users] Ceph stuck creating pool



This looks like something wrong with the crush rule.



What's the size, min_size and crush_rule of this pool?

 ceph osd pool get POOLNAME size

 ceph osd pool get POOLNAME min_size

 ceph osd pool get POOLNAME crush_ruleset



How is the crush rule?

 ceph osd crush rule dump




Regards,



Webert Lima

DevOps Engineer at MAV Tecnologia

*Belo Horizonte - Brasil*



On Tue, Oct 3, 2017 at 11:22 AM, Guilherme Lima 
wrote:

Hi,



I have installed a virtual Ceph Cluster lab. I using Ceph Luminous v12.2.1

It consist in 3 mon + 3 osd nodes.

Each node have 3 x 250GB OSD.



My osd tree:



ID CLASS WEIGHT  TYPE NAME  STATUS REWEIGHT PRI-AFF

-1   2.19589 root default

-3   0.73196 host osd1

0   hdd 0.24399 osd.0  up  1.0 1.0

6   hdd 0.24399 osd.6  up  1.0 1.0

9   hdd 0.24399 osd.9  up  1.0 1.0

-5   0.73196 host osd2

1   hdd 0.24399 osd.1  up  1.0 1.0

7   hdd 0.24399 osd.7  up  1.0 1.0

10   hdd 0.24399 osd.10 up  1.0 1.0

-7   0.73196 host osd3

2   hdd 0.24399 osd.2  up  1.0 1.0

8   hdd 0.24399 osd.8  up  1.0 1.0

11   hdd 0.24399 osd.11 up  1.0 1.0



After create a new pool it is stuck on creating+peering and
creating+activating.



  cluster:

id: d20fdc12-f8bf-45c1-a276-c36dfcc788bc

health: HEALTH_WARN

Reduced data availability: 256 pgs inactive, 143 pgs peering

Degraded data redundancy: 256 pgs unclean



  services:

mon: 3 daemons, quorum mon2,mon3,mon1

mgr: mon1(active), standbys: mon2, mon3

osd: 9 osds: 9 up, 9 in



  data:

pools:   1 pools, 256 pgs

objects: 0 objects, 0 bytes

usage:   10202 MB used, 2239 GB / 2249 GB avail

pgs: 100.000% pgs not active

 143 creating+peering

 113 creating+activating



Can anyone help to find the issue?



Thanks

Guilherme











This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system manager.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. If you are not the intended recipient
you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly
prohibited.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 


This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. 
If you have received this email in error please notify the system manager. 
This message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. If you are not the intended recipient 
you are notified that disclosing, copying, distributing or taking any 
action in reliance on the contents of this information is strictly 
prohibited.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph stuck creating pool

2017-10-03 Thread Webert de Souza Lima
This looks like something wrong with the crush rule.

What's the size, min_size and crush_rule of this pool?
 ceph osd pool get POOLNAME size
 ceph osd pool get POOLNAME min_size
 ceph osd pool get POOLNAME crush_ruleset

How is the crush rule?
 ceph osd crush rule dump


Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*

On Tue, Oct 3, 2017 at 11:22 AM, Guilherme Lima  wrote:

> Hi,
>
>
>
> I have installed a virtual Ceph Cluster lab. I using Ceph Luminous v12.2.1
>
> It consist in 3 mon + 3 osd nodes.
>
> Each node have 3 x 250GB OSD.
>
>
>
> My osd tree:
>
>
>
> ID CLASS WEIGHT  TYPE NAME  STATUS REWEIGHT PRI-AFF
>
> -1   2.19589 root default
>
> -3   0.73196 host osd1
>
> 0   hdd 0.24399 osd.0  up  1.0 1.0
>
> 6   hdd 0.24399 osd.6  up  1.0 1.0
>
> 9   hdd 0.24399 osd.9  up  1.0 1.0
>
> -5   0.73196 host osd2
>
> 1   hdd 0.24399 osd.1  up  1.0 1.0
>
> 7   hdd 0.24399 osd.7  up  1.0 1.0
>
> 10   hdd 0.24399 osd.10 up  1.0 1.0
>
> -7   0.73196 host osd3
>
> 2   hdd 0.24399 osd.2  up  1.0 1.0
>
> 8   hdd 0.24399 osd.8  up  1.0 1.0
>
> 11   hdd 0.24399 osd.11 up  1.0 1.0
>
>
>
> After create a new pool it is stuck on creating+peering and
> creating+activating.
>
>
>
>   cluster:
>
> id: d20fdc12-f8bf-45c1-a276-c36dfcc788bc
>
> health: HEALTH_WARN
>
> Reduced data availability: 256 pgs inactive, 143 pgs peering
>
> Degraded data redundancy: 256 pgs unclean
>
>
>
>   services:
>
> mon: 3 daemons, quorum mon2,mon3,mon1
>
> mgr: mon1(active), standbys: mon2, mon3
>
> osd: 9 osds: 9 up, 9 in
>
>
>
>   data:
>
> pools:   1 pools, 256 pgs
>
> objects: 0 objects, 0 bytes
>
> usage:   10202 MB used, 2239 GB / 2249 GB avail
>
> pgs: 100.000% pgs not active
>
>  143 creating+peering
>
>  113 creating+activating
>
>
>
> Can anyone help to find the issue?
>
>
>
> Thanks
>
> Guilherme
>
>
>
>
>
>
>
>
>
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you have received this email in error please notify the system manager.
> This message contains confidential information and is intended only for the
> individual named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. If you are not the intended recipient
> you are notified that disclosing, copying, distributing or taking any
> action in reliance on the contents of this information is strictly
> prohibited.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph stuck creating pool

2017-10-03 Thread Guilherme Lima
Hi,



I have installed a virtual Ceph Cluster lab. I using Ceph Luminous v12.2.1

It consist in 3 mon + 3 osd nodes.

Each node have 3 x 250GB OSD.



My osd tree:



ID CLASS WEIGHT  TYPE NAME  STATUS REWEIGHT PRI-AFF

-1   2.19589 root default

-3   0.73196 host osd1

0   hdd 0.24399 osd.0  up  1.0 1.0

6   hdd 0.24399 osd.6  up  1.0 1.0

9   hdd 0.24399 osd.9  up  1.0 1.0

-5   0.73196 host osd2

1   hdd 0.24399 osd.1  up  1.0 1.0

7   hdd 0.24399 osd.7  up  1.0 1.0

10   hdd 0.24399 osd.10 up  1.0 1.0

-7   0.73196 host osd3

2   hdd 0.24399 osd.2  up  1.0 1.0

8   hdd 0.24399 osd.8  up  1.0 1.0

11   hdd 0.24399 osd.11 up  1.0 1.0



After create a new pool it is stuck on creating+peering and
creating+activating.



  cluster:

id: d20fdc12-f8bf-45c1-a276-c36dfcc788bc

health: HEALTH_WARN

Reduced data availability: 256 pgs inactive, 143 pgs peering

Degraded data redundancy: 256 pgs unclean



  services:

mon: 3 daemons, quorum mon2,mon3,mon1

mgr: mon1(active), standbys: mon2, mon3

osd: 9 osds: 9 up, 9 in



  data:

pools:   1 pools, 256 pgs

objects: 0 objects, 0 bytes

usage:   10202 MB used, 2239 GB / 2249 GB avail

pgs: 100.000% pgs not active

 143 creating+peering

 113 creating+activating



Can anyone help to find the issue?



Thanks

Guilherme

-- 


This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. 
If you have received this email in error please notify the system manager. 
This message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. If you are not the intended recipient 
you are notified that disclosing, copying, distributing or taking any 
action in reliance on the contents of this information is strictly 
prohibited.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] BlueStore questions about workflow and performance

2017-10-03 Thread Alex Gorbachev
Hi Mark, great to hear from you!

On Tue, Oct 3, 2017 at 9:16 AM Mark Nelson  wrote:

>
>
> On 10/03/2017 07:59 AM, Alex Gorbachev wrote:
> > Hi Sam,
> >
> > On Mon, Oct 2, 2017 at 6:01 PM Sam Huracan  > > wrote:
> >
> > Anyone can help me?
> >
> > On Oct 2, 2017 17:56, "Sam Huracan"  > > wrote:
> >
> > Hi,
> >
> > I'm reading this document:
> >
> http://storageconference.us/2017/Presentations/CephObjectStore-slides.pdf
> >
> > I have 3 questions:
> >
> > 1. BlueStore writes both data (to raw block device) and metadata
> > (to RockDB) simultaneously, or sequentially?
> >
> > 2. From my opinion, performance of BlueStore can not compare to
> > FileStore using SSD Journal, because performance of raw disk is
> > less than using buffer. (this is buffer purpose). How do you
> think?
> >
> > 3.  Do setting Rock DB and Rock DB Wal in SSD only enhance
> > write, read performance? or both?
> >
> > Hope your answer,
> >
> >
> > I am researching the same thing, but recommend you look
> > at http://ceph.com/community/new-luminous-bluestore
> >
> > And also search for Bluestore cache to answer some questions.  My test
> > Luminous cluster so far is not as performant as I would like, but I have
> > not yet put a serious effort into tuning it, amd it does seem stable.
> >
> > Hth, Alex
>
> Hi Alex,
>
> If you see anything specific please let us know.  There are a couple of
> corner cases where bluestore is likely to be slower than filestore
> (specifically small sequential reads/writes with no client side cache or
> read ahead).  I've also seen some cases where filestore has higher read
> throughput potential (4MB seq reads with multiple NVMe drives per OSD
> node).  In many other cases bluestore is faster (and sometimes much
> faster) than filestore in our tests.  Writes in general tend to be
> faster and high volume object creation is much faster with much lower
> tail latencies (filestore really suffers in this test due to PG splitting).


I have two pretty well tuned filestore Jewel clusters running SATA HDDs on
dedicated hardware.  For the Luminous cluster, I wanted to do a POC on a
VMWare fully meshed (trendy moniker: hyperconverged) setup, using only
SSDs, Luminous and Bluestore.  Our workloads are unusual in that RBDs are
exported via iSCSI or NFS back to VMWare and consumed by e.g. Windows VMs
(we support heathcare and corporate business systems), or Linux VMs direct
from Ceph.

What I did so far is dedicate a hardware JBOD with an Areca HBA (you turned
me on to those a few years ago :) to each OSD VM. Using 6 Smartstorage SSD
OSDs per each OSD VM with 3 of these VMs total and 2x 20 Gb shared network
uplinks, I am getting about a third of performance of my hardware Jewel
cluster with 24 Lenovo enterprise SATA drives, measured as 4k block reads
and writes in single and 32 multiple streams.

Not apples to apples definitely, so I plan to play with Bluestore cache.
One question: does Bluestore distinguish between SSD and HDD based on CRUSH
class assignment?

I will check the effect of giving a lot of RAM and CPU cores to OSD VMs, as
well as increasing spindles and using different JBODs.

Thank you for reaching out.

Regards,
Alex


>
> Mark
>
> >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > --
> > --
> > Alex Gorbachev
> > Storcium
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-- 
--
Alex Gorbachev
Storcium
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] tunable question

2017-10-03 Thread Jake Young
On Tue, Oct 3, 2017 at 8:38 AM lists  wrote:

> Hi,
>
> What would make the decision easier: if we knew that we could easily
> revert the
>  > "ceph osd crush tunables optimal"
> once it has begun rebalancing data?
>
> Meaning: if we notice that impact is too high, or it will take too long,
> that we could simply again say
>  > "ceph osd crush tunables hammer"
> and the cluster would calm down again?


Yes you can revert the tunables back; but it will then move all the data
back where it was, so be prepared for that.

Verify you have the following values in ceph.conf. Note that these are the
defaults in Jewel, so if they aren’t defined, you’re probably good:
osd_max_backfills=1
osd_recovery_threads=1

You can try to set these (using ceph —inject) if you notice a large impact
to your client performance:
osd_recovery_op_priority=1
osd_recovery_max_active=1
osd_recovery_threads=1

I recall this tunables change when we went from hammer to jewel last year.
It took over 24 hours to rebalance 122TB on our 110 osd  cluster.

Jake


>
> MJ
>
> On 2-10-2017 9:41, Manuel Lausch wrote:
> > Hi,
> >
> > We have similar issues.
> > After upgradeing from hammer to jewel the tunable "choose leave stabel"
> > was introduces. If we activate it nearly all data will be moved. The
> > cluster has 2400 OSD on 40 nodes over two datacenters and is filled with
> > 2,5 PB Data.
> >
> > We tried to enable it but the backfillingtraffic is to high to be
> > handled without impacting other services on the Network.
> >
> > Do someone know if it is neccessary to enable this tunable? And could
> > it be a problem in the future if we want to upgrade to newer versions
> > wihout it enabled?
> >
> > Regards,
> > Manuel Lausch
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] BlueStore questions about workflow and performance

2017-10-03 Thread Mark Nelson



On 10/03/2017 07:59 AM, Alex Gorbachev wrote:

Hi Sam,

On Mon, Oct 2, 2017 at 6:01 PM Sam Huracan > wrote:

Anyone can help me?

On Oct 2, 2017 17:56, "Sam Huracan" > wrote:

Hi,

I'm reading this document:
 
http://storageconference.us/2017/Presentations/CephObjectStore-slides.pdf

I have 3 questions:

1. BlueStore writes both data (to raw block device) and metadata
(to RockDB) simultaneously, or sequentially?

2. From my opinion, performance of BlueStore can not compare to
FileStore using SSD Journal, because performance of raw disk is
less than using buffer. (this is buffer purpose). How do you think?

3.  Do setting Rock DB and Rock DB Wal in SSD only enhance
write, read performance? or both?

Hope your answer,


I am researching the same thing, but recommend you look
at http://ceph.com/community/new-luminous-bluestore

And also search for Bluestore cache to answer some questions.  My test
Luminous cluster so far is not as performant as I would like, but I have
not yet put a serious effort into tuning it, amd it does seem stable.

Hth, Alex


Hi Alex,

If you see anything specific please let us know.  There are a couple of 
corner cases where bluestore is likely to be slower than filestore 
(specifically small sequential reads/writes with no client side cache or 
read ahead).  I've also seen some cases where filestore has higher read 
throughput potential (4MB seq reads with multiple NVMe drives per OSD 
node).  In many other cases bluestore is faster (and sometimes much 
faster) than filestore in our tests.  Writes in general tend to be 
faster and high volume object creation is much faster with much lower 
tail latencies (filestore really suffers in this test due to PG splitting).


Mark






___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
--
Alex Gorbachev
Storcium


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] decreasing number of PGs

2017-10-03 Thread David Turner
Just remember that the warning appears at > 300 PGs/OSD, but the
recommendation is 100.  I would try to reduce your PGs by 1/3 or as close
as you can to that. My learning cluster I had to migrate data between pools
multiple times reducing the number of PGs as I went until I got to a more
normal amount. It affected the clients a fair bit, but that cluster is
still a 3 node cluster in active use.

Note that the data movements were rsync, dd, etc for rbds and cephfs.

On Tue, Oct 3, 2017, 8:54 AM Andrei Mikhailovsky  wrote:

> Thanks for your suggestions and help
>
> Andrei
> --
>
> *From: *"David Turner" 
> *To: *"Jack" , "ceph-users" <
> ceph-users@lists.ceph.com>
> *Sent: *Monday, 2 October, 2017 22:28:33
> *Subject: *Re: [ceph-users] decreasing number of PGs
>
> Adding more OSDs or deleting/recreating pools that have too many PGs are
> your only 2 options to reduce the number of PG's per OSD.  It is on the
> Ceph roadmap, but is not a currently supported feature.  You can
> alternatively adjust the setting threshold for the warning, but it is still
> a problem you should address in your cluster.
>
> On Mon, Oct 2, 2017 at 4:02 PM Jack  wrote:
>
>> You cannot;
>>
>>
>> On 02/10/2017 21:43, Andrei Mikhailovsky wrote:
>> > Hello everyone,
>> >
>> > what is the safest way to decrease the number of PGs in the cluster.
>> Currently, I have too many per osd.
>> >
>> > Thanks
>> >
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] BlueStore questions about workflow and performance

2017-10-03 Thread Alex Gorbachev
Hi Sam,

On Mon, Oct 2, 2017 at 6:01 PM Sam Huracan  wrote:

> Anyone can help me?
>
> On Oct 2, 2017 17:56, "Sam Huracan"  wrote:
>
>> Hi,
>>
>> I'm reading this document:
>>
>> http://storageconference.us/2017/Presentations/CephObjectStore-slides.pdf
>>
>> I have 3 questions:
>>
>> 1. BlueStore writes both data (to raw block device) and metadata (to
>> RockDB) simultaneously, or sequentially?
>>
>> 2. From my opinion, performance of BlueStore can not compare to FileStore
>> using SSD Journal, because performance of raw disk is less than using
>> buffer. (this is buffer purpose). How do you think?
>>
>> 3.  Do setting Rock DB and Rock DB Wal in SSD only enhance write, read
>> performance? or both?
>>
>> Hope your answer,
>>
>
I am researching the same thing, but recommend you look at
http://ceph.com/community/new-luminous-bluestore

And also search for Bluestore cache to answer some questions.  My test
Luminous cluster so far is not as performant as I would like, but I have
not yet put a serious effort into tuning it, amd it does seem stable.

Hth, Alex



>>
>> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-- 
--
Alex Gorbachev
Storcium
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] decreasing number of PGs

2017-10-03 Thread Andrei Mikhailovsky
Thanks for your suggestions and help 

Andrei 

> From: "David Turner" 
> To: "Jack" , "ceph-users" 
> Sent: Monday, 2 October, 2017 22:28:33
> Subject: Re: [ceph-users] decreasing number of PGs

> Adding more OSDs or deleting/recreating pools that have too many PGs are your
> only 2 options to reduce the number of PG's per OSD. It is on the Ceph 
> roadmap,
> but is not a currently supported feature. You can alternatively adjust the
> setting threshold for the warning, but it is still a problem you should 
> address
> in your cluster.

> On Mon, Oct 2, 2017 at 4:02 PM Jack < [ mailto:c...@jack.fr.eu.org |
> c...@jack.fr.eu.org ] > wrote:

>> You cannot;

>> On 02/10/2017 21:43, Andrei Mikhailovsky wrote:
>> > Hello everyone,

>>> what is the safest way to decrease the number of PGs in the cluster. 
>>> Currently,
>> > I have too many per osd.

>> > Thanks



>> > ___
>> > ceph-users mailing list
>> > [ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.com ]
>>> [ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ]


>> ___
>> ceph-users mailing list
>> [ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.com ]
>> [ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ]

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] tunable question

2017-10-03 Thread lists

Hi,

What would make the decision easier: if we knew that we could easily 
revert the

> "ceph osd crush tunables optimal"
once it has begun rebalancing data?

Meaning: if we notice that impact is too high, or it will take too long, 
that we could simply again say

> "ceph osd crush tunables hammer"
and the cluster would calm down again?

MJ

On 2-10-2017 9:41, Manuel Lausch wrote:

Hi,

We have similar issues.
After upgradeing from hammer to jewel the tunable "choose leave stabel"
was introduces. If we activate it nearly all data will be moved. The
cluster has 2400 OSD on 40 nodes over two datacenters and is filled with
2,5 PB Data.

We tried to enable it but the backfillingtraffic is to high to be
handled without impacting other services on the Network.

Do someone know if it is neccessary to enable this tunable? And could
it be a problem in the future if we want to upgrade to newer versions
wihout it enabled?

Regards,
Manuel Lausch


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to use rados_aio_write correctly?

2017-10-03 Thread Alexander Kushnirenko
Hello,

I'm working on third party code (Bareos Storage daemon) which gives very
low write speeds for CEPH.  The code was written to demonstrate that it is
possible, but the speed is about 3-9 MB/s which is too slow.   I modified
the routine to use rados_aio_write instead of rados_write, and was able to
backup/restore data successfully with the speed about 30MB/s, which what I
would expect on 1GB/s network and rados bench results.  I studied examples
in the documents and github, but still I'm afraid that by code is working
merely by accident.  Could some one comment on the following questions:

Q1. Storage daemon sends write requests of 64K size, so current code works
like this:

rados_write(., buffer, len=64K, offset=0)
rados_write(., buffer, len=64K, offset=64K)
rados_write(., buffer, len=64K, offset=128K)
... and so on ...

What is the correct way to use AIO (to use one completion or several?)
Version 1:

rados_aio_create_completion(NULL, NULL, NULL, );
rados_aio_write(., comp, buffer, len=64K, offset=0)
rados_aio_write(., comp, buffer, len=64K, offset=64K)
rados_aio_write(., comp, buffer, len=64K, offset=128K)
rados_aio_wait_for_complete(comp);// wait for Async IO in memory
rados_aio_wait_for_safe(comp);// and on disk
rados_aio_release(comp);

Version 2:
rados_aio_create_completion(NULL, NULL, NULL, );
rados_aio_create_completion(NULL, NULL, NULL, );
rados_aio_create_completion(NULL, NULL, NULL, );
rados_aio_write(., comp1, buffer, len=64K, offset=0)
rados_aio_write(., comp2, buffer, len=64K, offset=64K)
rados_aio_write(., comp3, buffer, len=64K, offset=128K)
rados_aio_wait_for_complete(comp1);
rados_aio_wait_for_complete(comp2);
rados_aio_wait_for_complete(comp3);
rados_aio_write(., comp1, buffer, len=64K, offset=192K)
rados_aio_write(., comp2, buffer, len=64K, offset=256K)
rados_aio_write(., comp3, buffer, len=64K, offset=320K)
.

Q2.  Problem of maximum object size.  When I use rados_write I get an error
when I exceed maximum object size (132MB in luminous).  But when I use
rados_aio_write it happily goes beyond the limit of object, but actually
writes nothing, but does not make any error.  Is there a way to catch such
situation?

Alexander
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph monitoring

2017-10-03 Thread John Spray
On Tue, Oct 3, 2017 at 7:37 AM, Jasper Spaans
 wrote:
> Hi,
>
> On 02/10/2017 13:34, Osama Hasebou wrote:
>
> Hi Everyone,
>
> Is there a guide/tutorial about how to setup Ceph monitoring system using
> collectd / grafana / graphite ? Other suggestions are welcome as well !
>
> I found some GitHub solutions but not much documentation on how to
> implement.
>
>
> I tried setting up Prometheus to add monitoring to my Ceph
> single-node-cluster at home using the new ceph-mgr goodies, but that didn't
> really work out of the box[0]. This because there are some issues with the
> identifier names being generated by the prometheus module for ceph-mgr in
> luminous, which appear to have been solved in the master branch.
>
> Just plugging in a fresh prometheus/module.py[1] and restarting the mgr
> daemon allowed me to actually scrape the target using prometheus though.
>
> Now to find or build a pretty dashboard with all of these metrics. I wasn't
> able to find something in the grafana supplied dashboards, and haven't spent
> enough time on openattic to extract a dashboard from there. Any pointers
> appreciated!
>
> As a side note, during his talk at the NL Ceph day, John Spray spoke about
> being more liberal with updates/backports for those modules. Would this be a
> candidate for such a policy, as the current one is dysfunctional?

Yes, absolutely.  Just waiting to get some testing/confidence in the
new code on master, before backporting to luminous and calling the
counter naming stable.

If you get a reasonably functional dashboard up and running on the
master code, that'll probably be enough confidence for me to go ahead
and backport it.

John

>
>
> Cheers,
> Jasper
>
>
> [0] I'm running the Ceph-supplied packages on Debian, currently at
> 12.2.1-1~bpo90+1.
> [1]
> https://github.com/ceph/ceph/blob/master/src/pybind/mgr/prometheus/module.py
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph monitoring

2017-10-03 Thread Matthew Vernon
On 02/10/17 20:26, Erik McCormick wrote:
> On Mon, Oct 2, 2017 at 11:55 AM, Matthew Vernon  wrote:
>> Making a dashboard is rather a matter of personal preference - we plot
>> client and s3 i/o, network, server load & CPU use, and have indicator
>> plots for numbers of osds up, and monitor quorum.
>>
>> [I could share our dashboard JSON, but it's obviously specific to our
>> data sources]
> 
> I for one would love to see your dashboard. host and data source names
> can be easily replaced :)

OK. A screenshot is:
https://cog.sanger.ac.uk/ceph_dashboard/screenshot.png

(which should be self-explanatory - that's rather the point :) )

The json that builds it is:
https://cog.sanger.ac.uk/ceph_dashboard/dashboard.json

(you'd want to change the data source and hostnames to suit your own
install; sto-1-1 is one of our mon nodes).

HTH,

Matthew


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW how to delete orphans

2017-10-03 Thread Andreas Calminder
The output, to stdout, is something like leaked: $objname. Am I supposed to
pipe it to a log, grep for leaked: and pipe it to rados delete? Or am I
supposed to dig around in the log pool to try and find the objects there?
The information available is quite vague. Maybe Yehuda can shed some light
on this issue?

Best regards,
/Andreas

On 3 Oct 2017 06:25, "Christian Wuerdig" 
wrote:

> yes, at least that's how I'd interpret the information given in this
> thread: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-
> February/016521.html
>
> On Tue, Oct 3, 2017 at 1:11 AM, Webert de Souza Lima
>  wrote:
> > Hey Christian,
> >
> >> On 29 Sep 2017 12:32 a.m., "Christian Wuerdig"
> >>  wrote:
> >>>
> >>> I'm pretty sure the orphan find command does exactly just that -
> >>> finding orphans. I remember some emails on the dev list where Yehuda
> >>> said he wasn't 100% comfortable of automating the delete just yet.
> >>> So the purpose is to run the orphan find tool and then delete the
> >>> orphaned objects once you're happy that they all are actually
> >>> orphaned.
> >>>
> >
> > so what you mean is that one should manually remove the result listed
> > objects that are output?
> >
> >
> > Regards,
> >
> > Webert Lima
> > DevOps Engineer at MAV Tecnologia
> > Belo Horizonte - Brasil
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] zone, zonegroup and resharding bucket on luminous

2017-10-03 Thread Yoann Moulin
Hello,

>> I'm doing some tests on the radosgw on luminous (12.2.1), I have a few 
>> questions.
>>
>> In the documentation[1], there is a reference to "radosgw-admin region get" 
>> but it seems not to be available anymore.
>> It should be "radosgw-admin zonegroup get" I guess.
>>
>> 1. http://docs.ceph.com/docs/luminous/install/install-ceph-gateway/
>>
>> I have installed my luminous cluster with ceph-ansible playbook.
>>
>> but when I try to manipulate zonegroup or zone, I have this
>>
>>> # radosgw-admin zonegroup get
>>> failed to init zonegroup: (2) No such file or directory
> 
> try with --rgw-zonegroup=default
> 
>>> # radosgw-admin  zone get
>>> unable to initialize zone: (2) No such file or directory
>
> try with --rgw-zone=default
> 
>> I guessed it's because I don't have a realm set and not default zone and 
>> zonegroup ?
> 
> The default zone and zonegroup are  part of the realm so without a
> realm you cannot set them as defaults.
> This means you have to specifiy --rgw-zonegroup=default and --rgw-zone=default
>  I am guessing our documentation needs updating :(
> I think we can improve our behavior and make those command works
> without a realm , i.e return the default zonegroup and zone. I will
> open a tracker issue for that.

a bug seems to be already open :

http://tracker.ceph.com/issues/21583

>> Is that the default behaviour not to create default realm on a fresh radosgw 
>> ? Or is it a side effect of ceph-ansible installation ?
>>
> It is the default behavior, there is no default realm.
> 
>> I have a bucket that referred to a zonegroup but without realm. Can I create 
>> a default realm ? Is that safe for the bucket that has already been
>> uploaded ?
>>
> Yes You can create a realm and add the zonegroup to it.
> Don't forgot to do "radosgw-admin period update --commit" to commit the 
> changes.

I did that :

# radosgw-admin realm create --rgw-realm=default --default
{
"id": "b5cc8a8e-bd96-4b19-8cdd-e87a58ed518a",
"name": "default",
"current_period": "e7bfcb5a-829b-418f-ae26-d6573a5cc8b9",
"epoch": 2
}

# radosgw-admin zonegroup modify 
--realm-id=b5cc8a8e-bd96-4b19-8cdd-e87a58ed518a --rgw-zonegroup=default 
--default

# radosgw-admin zone modify --realm-id=b5cc8a8e-bd96-4b19-8cdd-e87a58ed518a 
--rgw-zone=default --default

# radosgw-admin period update --commit

and it works now, I can edit zone and zonegroup :)

>> On the "default" zonegroup (which is not set as default), the  
>> "bucket_index_max_shards" is set to "0", can I modify it without reaml ?
>>
> I just updated this section in this pr: 
> https://github.com/ceph/ceph/pull/18063

as discuss on irc, I did that but found on a bug :

# radosgw-admin bucket reshard process --bucket image-net --num-shards=150

=> http://tracker.ceph.com/issues/21619

Thanks,

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com