Re: [ceph-users] Increasing time to save RGW objects

2016-02-10 Thread Saverio Proto
What kind of authentication you use against the Rados Gateway ?

We had similar problem authenticating against our Keystone server. If
the Keystone server is overloaded the time to read/write RGW objects
increases. You will not see anything wrong on the ceph side.

Saverio

2016-02-08 17:49 GMT+01:00 Kris Jurka :
>
> I've been testing the performance of ceph by storing objects through RGW.
> This is on Debian with Hammer using 40 magnetic OSDs, 5 mons, and 4 RGW
> instances.  Initially the storage time was holding reasonably steady, but it
> has started to rise recently as shown in the attached chart.
>
> The test repeatedly saves 100k objects of 55 kB size using multiple threads
> (50) against multiple RGW gateways (4).  It uses a sequential identifier as
> the object key and shards the bucket name using id % 100.  The buckets have
> index sharding enabled with 64 index shards per bucket.
>
> ceph status doesn't appear to show any issues.  Is there something I should
> be looking at here?
>
>
> # ceph status
> cluster 3fc86d01-cf9c-4bed-b130-7a53d7997964
>  health HEALTH_OK
>  monmap e2: 5 mons at
> {condor=192.168.188.90:6789/0,duck=192.168.188.140:6789/0,eagle=192.168.188.100:6789/0,falcon=192.168.188.110:6789/0,shark=192.168.188.118:6789/0}
> election epoch 18, quorum 0,1,2,3,4
> condor,eagle,falcon,shark,duck
>  osdmap e674: 40 osds: 40 up, 40 in
>   pgmap v258756: 3128 pgs, 10 pools, 1392 GB data, 27282 kobjects
> 4784 GB used, 69499 GB / 74284 GB avail
> 3128 active+clean
>   client io 268 kB/s rd, 1100 kB/s wr, 493 op/s
>
>
> Kris Jurka
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Lionel Bouton
Le 09/02/2016 20:07, Kris Jurka a écrit :
>
>
> On 2/9/2016 10:11 AM, Lionel Bouton wrote:
>
>> Actually if I understand correctly how PG splitting works the next spike
>> should be  times smaller and spread over  times the period (where
>>  is the number of subdirectories created during each split which
>> seems to be 15 according to OSDs' directory layout).
>>
>
> I would expect that splitting one directory would take the same amount
> of time as it did this time, it's just that now there will be N times
> as many directories to split because of the previous splits.  So the
> duration of the spike would be quite a bit longer.

Oops I missed this bit, I believe you are right: the spike duration
should be ~16x longer but the slowdown roughly the same over this new
period :-(

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Samuel Just
There was a patch at some point to pre-split on pg creation (merged in
ad6a2be402665215a19708f55b719112096da3f4).  More generally, bluestore
is the answer to this.
-Sam

On Tue, Feb 9, 2016 at 11:34 AM, Lionel Bouton
 wrote:
> Le 09/02/2016 20:18, Lionel Bouton a écrit :
>> Le 09/02/2016 20:07, Kris Jurka a écrit :
>>>
>>> On 2/9/2016 10:11 AM, Lionel Bouton wrote:
>>>
 Actually if I understand correctly how PG splitting works the next spike
 should be  times smaller and spread over  times the period (where
  is the number of subdirectories created during each split which
 seems to be 15 according to OSDs' directory layout).

>>> I would expect that splitting one directory would take the same amount
>>> of time as it did this time, it's just that now there will be N times
>>> as many directories to split because of the previous splits.  So the
>>> duration of the spike would be quite a bit longer.
>> Oops I missed this bit, I believe you are right: the spike duration
>> should be ~16x longer but the slowdown roughly the same over this new
>> period :-(
>
> As I don't see any way around this, I'm thinking out of the box.
>
> As splitting is costly for you you might want to try to avoid it (or at
> least limit it to the first occurrence if your use case can handle such
> a slowdown).
> You can test increasing the PG number of your pool before reaching the
> point where the split starts.
> This would generate movements but this might (or might not) slow down
> your access less than what you see when splitting occurs (I'm not sure
> about the exact constraints but basically Ceph forces you to increase
> the number of placement PG by small amounts which should limit the
> performance impact).
>
> Another way to do this with no movement and slowdown is to add pools
> (which basically create new placement groups without rebalancing data)
> but this means modifying your application so that new objects are stored
> on the new pool (which may or may not be possible depending on your
> actual access patterns).
>
> There are limits to these 2 suggestions : increasing the number of
> placement groups have costs so you might want to check with devs how
> high you can go and if it fits your constraints.
>
> Lionel.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Kris Jurka



On 2/9/2016 10:11 AM, Lionel Bouton wrote:


Actually if I understand correctly how PG splitting works the next spike
should be  times smaller and spread over  times the period (where
 is the number of subdirectories created during each split which
seems to be 15 according to OSDs' directory layout).



I would expect that splitting one directory would take the same amount 
of time as it did this time, it's just that now there will be N times as 
many directories to split because of the previous splits.  So the 
duration of the spike would be quite a bit longer.



That said, the problem that could happen is that by the time you reach
the next split you might have reached  times the object creation
speed you have currently and get the very same spike.



This test runs as fast as possible, so in the best case scenario, object 
creation speed would stay the same, but is likely to gradually slow over 
time.


Kris Jurka
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Lionel Bouton
Le 09/02/2016 20:18, Lionel Bouton a écrit :
> Le 09/02/2016 20:07, Kris Jurka a écrit :
>>
>> On 2/9/2016 10:11 AM, Lionel Bouton wrote:
>>
>>> Actually if I understand correctly how PG splitting works the next spike
>>> should be  times smaller and spread over  times the period (where
>>>  is the number of subdirectories created during each split which
>>> seems to be 15 according to OSDs' directory layout).
>>>
>> I would expect that splitting one directory would take the same amount
>> of time as it did this time, it's just that now there will be N times
>> as many directories to split because of the previous splits.  So the
>> duration of the spike would be quite a bit longer.
> Oops I missed this bit, I believe you are right: the spike duration
> should be ~16x longer but the slowdown roughly the same over this new
> period :-(

As I don't see any way around this, I'm thinking out of the box.

As splitting is costly for you you might want to try to avoid it (or at
least limit it to the first occurrence if your use case can handle such
a slowdown).
You can test increasing the PG number of your pool before reaching the
point where the split starts.
This would generate movements but this might (or might not) slow down
your access less than what you see when splitting occurs (I'm not sure
about the exact constraints but basically Ceph forces you to increase
the number of placement PG by small amounts which should limit the
performance impact).

Another way to do this with no movement and slowdown is to add pools
(which basically create new placement groups without rebalancing data)
but this means modifying your application so that new objects are stored
on the new pool (which may or may not be possible depending on your
actual access patterns).

There are limits to these 2 suggestions : increasing the number of
placement groups have costs so you might want to check with devs how
high you can go and if it fits your constraints.

Lionel.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Wade Holler
Hi there,

What is the best way to "look at the rgw admin socket " to see what
operations are taking a long time ?

Best Regards
Wade
On Mon, Feb 8, 2016 at 12:16 PM Gregory Farnum  wrote:

> On Mon, Feb 8, 2016 at 8:49 AM, Kris Jurka  wrote:
> >
> > I've been testing the performance of ceph by storing objects through RGW.
> > This is on Debian with Hammer using 40 magnetic OSDs, 5 mons, and 4 RGW
> > instances.  Initially the storage time was holding reasonably steady,
> but it
> > has started to rise recently as shown in the attached chart.
> >
> > The test repeatedly saves 100k objects of 55 kB size using multiple
> threads
> > (50) against multiple RGW gateways (4).  It uses a sequential identifier
> as
> > the object key and shards the bucket name using id % 100.  The buckets
> have
> > index sharding enabled with 64 index shards per bucket.
> >
> > ceph status doesn't appear to show any issues.  Is there something I
> should
> > be looking at here?
> >
> >
> > # ceph status
> > cluster 3fc86d01-cf9c-4bed-b130-7a53d7997964
> >  health HEALTH_OK
> >  monmap e2: 5 mons at
> > {condor=
> 192.168.188.90:6789/0,duck=192.168.188.140:6789/0,eagle=192.168.188.100:6789/0,falcon=192.168.188.110:6789/0,shark=192.168.188.118:6789/0
> }
> > election epoch 18, quorum 0,1,2,3,4
> > condor,eagle,falcon,shark,duck
> >  osdmap e674: 40 osds: 40 up, 40 in
> >   pgmap v258756: 3128 pgs, 10 pools, 1392 GB data, 27282 kobjects
> > 4784 GB used, 69499 GB / 74284 GB avail
> > 3128 active+clean
> >   client io 268 kB/s rd, 1100 kB/s wr, 493 op/s
>
> It's probably a combination of your bucket indices getting larger and
> your PGs getting split into subfolders on the OSDs. If you keep
> running tests and things get slower it's the first; if they speed
> partway back up again it's the latter.
> Other things to check:
> * you can look at your OSD stores and how the object files are divvied up.
> * you can look at the rgw admin socket and/or logs to see what
> operations are the ones taking time
> * you can check the dump_historic_ops on the OSDs to see if there are
> any notably slow ops
> -Greg
>
> >
> >
> > Kris Jurka
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Kris Jurka



On 2/8/2016 9:16 AM, Gregory Farnum wrote:

On Mon, Feb 8, 2016 at 8:49 AM, Kris Jurka  wrote:


I've been testing the performance of ceph by storing objects through RGW.
This is on Debian with Hammer using 40 magnetic OSDs, 5 mons, and 4 RGW
instances.  Initially the storage time was holding reasonably steady, but it
has started to rise recently as shown in the attached chart.



It's probably a combination of your bucket indices getting larger and
your PGs getting split into subfolders on the OSDs. If you keep
running tests and things get slower it's the first; if they speed
partway back up again it's the latter.


Indeed, after running for another day, performance has leveled back out, 
as attached.  So tuning something like filestore_split_multiple would 
have moved around the time of this performance spike, but is there a way 
to eliminate it?  Some way of saying, start with N levels of directory 
structure because I'm going to have a ton of objects?  If this test 
continues, it's just going to hit another, worse spike later when it 
needs to split again.



Other things to check:
* you can look at your OSD stores and how the object files are divvied up.


Yes, checking the directory structure and times on the OSDs does show 
that things have been split recently.


Kris Jurka
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Lionel Bouton
Hi,

Le 09/02/2016 17:07, Kris Jurka a écrit :
>
>
> On 2/8/2016 9:16 AM, Gregory Farnum wrote:
>> On Mon, Feb 8, 2016 at 8:49 AM, Kris Jurka  wrote:
>>>
>>> I've been testing the performance of ceph by storing objects through
>>> RGW.
>>> This is on Debian with Hammer using 40 magnetic OSDs, 5 mons, and 4 RGW
>>> instances.  Initially the storage time was holding reasonably
>>> steady, but it
>>> has started to rise recently as shown in the attached chart.
>>>
>>
>> It's probably a combination of your bucket indices getting larger and
>> your PGs getting split into subfolders on the OSDs. If you keep
>> running tests and things get slower it's the first; if they speed
>> partway back up again it's the latter.
>
> Indeed, after running for another day, performance has leveled back
> out, as attached.  So tuning something like filestore_split_multiple
> would have moved around the time of this performance spike, but is
> there a way to eliminate it?  Some way of saying, start with N levels
> of directory structure because I'm going to have a ton of objects?  If
> this test continues, it's just going to hit another, worse spike later
> when it needs to split again.

Actually if I understand correctly how PG splitting works the next spike
should be  times smaller and spread over  times the period (where
 is the number of subdirectories created during each split which
seems to be 15 according to OSDs' directory layout).

That said, the problem that could happen is that by the time you reach
the next split you might have reached  times the object creation
speed you have currently and get the very same spike.

Best regards,

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Lionel Bouton
Le 09/02/2016 19:11, Lionel Bouton a écrit :
> Actually if I understand correctly how PG splitting works the next spike
> should be  times smaller and spread over  times the period (where
>  is the number of subdirectories created during each split which
> seems to be 15

typo : 16
>  according to OSDs' directory layout).
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Gregory Farnum
On Tue, Feb 9, 2016 at 8:07 AM, Kris Jurka  wrote:
>
>
> On 2/8/2016 9:16 AM, Gregory Farnum wrote:
>>
>> On Mon, Feb 8, 2016 at 8:49 AM, Kris Jurka  wrote:
>>>
>>>
>>> I've been testing the performance of ceph by storing objects through RGW.
>>> This is on Debian with Hammer using 40 magnetic OSDs, 5 mons, and 4 RGW
>>> instances.  Initially the storage time was holding reasonably steady, but
>>> it
>>> has started to rise recently as shown in the attached chart.
>>>
>>
>> It's probably a combination of your bucket indices getting larger and
>> your PGs getting split into subfolders on the OSDs. If you keep
>> running tests and things get slower it's the first; if they speed
>> partway back up again it's the latter.
>
>
> Indeed, after running for another day, performance has leveled back out, as
> attached.  So tuning something like filestore_split_multiple would have
> moved around the time of this performance spike, but is there a way to
> eliminate it?  Some way of saying, start with N levels of directory
> structure because I'm going to have a ton of objects?  If this test
> continues, it's just going to hit another, worse spike later when it needs
> to split again.

This has been discussed before but I'm not sure of the outcome. Sam?
-Greg

>
>> Other things to check:
>> * you can look at your OSD stores and how the object files are divvied up.
>
>
> Yes, checking the directory structure and times on the OSDs does show that
> things have been split recently.
>
> Kris Jurka
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing time to save RGW objects

2016-02-08 Thread Gregory Farnum
On Mon, Feb 8, 2016 at 8:49 AM, Kris Jurka  wrote:
>
> I've been testing the performance of ceph by storing objects through RGW.
> This is on Debian with Hammer using 40 magnetic OSDs, 5 mons, and 4 RGW
> instances.  Initially the storage time was holding reasonably steady, but it
> has started to rise recently as shown in the attached chart.
>
> The test repeatedly saves 100k objects of 55 kB size using multiple threads
> (50) against multiple RGW gateways (4).  It uses a sequential identifier as
> the object key and shards the bucket name using id % 100.  The buckets have
> index sharding enabled with 64 index shards per bucket.
>
> ceph status doesn't appear to show any issues.  Is there something I should
> be looking at here?
>
>
> # ceph status
> cluster 3fc86d01-cf9c-4bed-b130-7a53d7997964
>  health HEALTH_OK
>  monmap e2: 5 mons at
> {condor=192.168.188.90:6789/0,duck=192.168.188.140:6789/0,eagle=192.168.188.100:6789/0,falcon=192.168.188.110:6789/0,shark=192.168.188.118:6789/0}
> election epoch 18, quorum 0,1,2,3,4
> condor,eagle,falcon,shark,duck
>  osdmap e674: 40 osds: 40 up, 40 in
>   pgmap v258756: 3128 pgs, 10 pools, 1392 GB data, 27282 kobjects
> 4784 GB used, 69499 GB / 74284 GB avail
> 3128 active+clean
>   client io 268 kB/s rd, 1100 kB/s wr, 493 op/s

It's probably a combination of your bucket indices getting larger and
your PGs getting split into subfolders on the OSDs. If you keep
running tests and things get slower it's the first; if they speed
partway back up again it's the latter.
Other things to check:
* you can look at your OSD stores and how the object files are divvied up.
* you can look at the rgw admin socket and/or logs to see what
operations are the ones taking time
* you can check the dump_historic_ops on the OSDs to see if there are
any notably slow ops
-Greg

>
>
> Kris Jurka
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com