Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-11 Thread Andrija Panic
Hi Dan,

the script provided seems to not work on my ceph cluster :(
This is ceph version 0.80.3

I get empty results, on both debug level 10 and the maximum level of 20...

[root@cs1 ~]# ./rbd-io-stats.pl /var/log/ceph/ceph-osd.0.log-20140811.gz
Writes per OSD:
Writes per pool:
Writes per PG:
Writes per RBD:
Writes per object:
Writes per length:
.
.
.




On 8 August 2014 16:01, Dan Van Der Ster daniel.vanders...@cern.ch wrote:

  Hi,

  On 08 Aug 2014, at 15:55, Andrija Panic andrija.pa...@gmail.com wrote:

  Hi Dan,

  thank you very much for the script, will check it out...no thortling so
 far, but I guess it will have to be done...

  This seems to read only gziped logs?


  Well it’s pretty simple, and it zcat’s each input file. So yes, only gz
 files in the current script. But you can change that pretty trivially ;)

  so since read only I guess it is safe to run it on proudction cluster
 now… ?


  I personally don’t do anything new on a Friday just before leaving ;)

  But its just grepping the log files, so start with one, then two, then...

   The script will also check for mulitply OSDs as far as I can
 understadn, not just osd.0 given in script comment ?


  Yup, what I do is gather all of the OSD logs for a single day in a
 single directory (in CephFS ;), then run that script on all of the OSDs. It
 takes awhile, but it will give you the overall daily totals for the whole
 cluster.

  If you are only trying to find the top users, then it is sufficient to
 check a subset of OSDs, since by their nature the client IOs are spread
 across most/all OSDs.

  Cheers, Dan

  Thanks a lot.
 Andrija




 On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch
 wrote:

 Hi,
 Here’s what we do to identify our top RBD users.

  First, enable log level 10 for the filestore so you can see all the IOs
 coming from the VMs. Then use a script like this (used on a dumpling
 cluster):


 https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl

  to summarize the osd logs and identify the top clients.

  Then its just a matter of scripting to figure out the ops/sec per
 volume, but for us at least the main use-case has been to identify who is
 responsible for a new peak in overall ops — and daily-granular statistics
 from the above script tends to suffice.

  BTW, do you throttle your clients? We found that its absolutely
 necessary, since without a throttle just a few active VMs can eat up the
 entire iops capacity of the cluster.

  Cheers, Dan

 -- Dan van der Ster || Data  Storage Services || CERN IT Department --


   On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com
 wrote:

Hi,

  we just had some new clients, and have suffered very big degradation in
 CEPH performance for some reasons (we are using CloudStack).

  I'm wondering if there is way to monitor OP/s or similar usage by
 client connected, so we can isolate the heavy client ?

  Also, what is the general best practice to monitor these kind of
 changes in CEPH ? I'm talking about R/W or OP/s change or similar...

  Thanks,
 --

 Andrija Panić

___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





  --

 Andrija Panić
 --
   http://admintweets.com
 --





-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-11 Thread Andrija Panic
I appologize, clicked the Send button to fast...

Anyway, I can see there are lines in log file:
2014-08-11 12:43:25.477693 7f022d257700 10
filestore(/var/lib/ceph/osd/ceph-0) write
3.48_head/14b1ca48/rbd_data.41e16619f5eb6.1bd1/head//3
3641344~4608 = 4608
Not sure if I can do anything to fix this... ?

Thanks,
Andrija



On 11 August 2014 12:46, Andrija Panic andrija.pa...@gmail.com wrote:

 Hi Dan,

 the script provided seems to not work on my ceph cluster :(
 This is ceph version 0.80.3

 I get empty results, on both debug level 10 and the maximum level of 20...

 [root@cs1 ~]# ./rbd-io-stats.pl /var/log/ceph/ceph-osd.0.log-20140811.gz
 Writes per OSD:
 Writes per pool:
 Writes per PG:
 Writes per RBD:
 Writes per object:
 Writes per length:
 .
 .
 .




 On 8 August 2014 16:01, Dan Van Der Ster daniel.vanders...@cern.ch
 wrote:

  Hi,

  On 08 Aug 2014, at 15:55, Andrija Panic andrija.pa...@gmail.com wrote:

  Hi Dan,

  thank you very much for the script, will check it out...no thortling so
 far, but I guess it will have to be done...

  This seems to read only gziped logs?


  Well it’s pretty simple, and it zcat’s each input file. So yes, only gz
 files in the current script. But you can change that pretty trivially ;)

  so since read only I guess it is safe to run it on proudction cluster
 now… ?


  I personally don’t do anything new on a Friday just before leaving ;)

  But its just grepping the log files, so start with one, then two,
 then...

   The script will also check for mulitply OSDs as far as I can
 understadn, not just osd.0 given in script comment ?


  Yup, what I do is gather all of the OSD logs for a single day in a
 single directory (in CephFS ;), then run that script on all of the OSDs. It
 takes awhile, but it will give you the overall daily totals for the whole
 cluster.

  If you are only trying to find the top users, then it is sufficient to
 check a subset of OSDs, since by their nature the client IOs are spread
 across most/all OSDs.

  Cheers, Dan

  Thanks a lot.
 Andrija




 On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch
 wrote:

 Hi,
 Here’s what we do to identify our top RBD users.

  First, enable log level 10 for the filestore so you can see all the
 IOs coming from the VMs. Then use a script like this (used on a dumpling
 cluster):


 https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl

  to summarize the osd logs and identify the top clients.

  Then its just a matter of scripting to figure out the ops/sec per
 volume, but for us at least the main use-case has been to identify who is
 responsible for a new peak in overall ops — and daily-granular statistics
 from the above script tends to suffice.

  BTW, do you throttle your clients? We found that its absolutely
 necessary, since without a throttle just a few active VMs can eat up the
 entire iops capacity of the cluster.

  Cheers, Dan

 -- Dan van der Ster || Data  Storage Services || CERN IT Department --


   On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com
 wrote:

Hi,

  we just had some new clients, and have suffered very big degradation
 in CEPH performance for some reasons (we are using CloudStack).

  I'm wondering if there is way to monitor OP/s or similar usage by
 client connected, so we can isolate the heavy client ?

  Also, what is the general best practice to monitor these kind of
 changes in CEPH ? I'm talking about R/W or OP/s change or similar...

  Thanks,
 --

 Andrija Panić

___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





  --

 Andrija Panić
 --
   http://admintweets.com
 --





 --

 Andrija Panić
 --
   http://admintweets.com
 --




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-11 Thread Dan Van Der Ster
Hi,
I changed the script to be a bit more flexible with the osd path. Give this a 
try again:
https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl
Cheers, Dan

-- Dan van der Ster || Data  Storage Services || CERN IT Department --


On 11 Aug 2014, at 12:48, Andrija Panic 
andrija.pa...@gmail.commailto:andrija.pa...@gmail.com wrote:

I appologize, clicked the Send button to fast...

Anyway, I can see there are lines in log file:
2014-08-11 12:43:25.477693 7f022d257700 10 filestore(/var/lib/ceph/osd/ceph-0) 
write 3.48_head/14b1ca48/rbd_data.41e16619f5eb6.1bd1/head//3 
3641344~4608 = 4608
Not sure if I can do anything to fix this... ?

Thanks,
Andrija



On 11 August 2014 12:46, Andrija Panic 
andrija.pa...@gmail.commailto:andrija.pa...@gmail.com wrote:
Hi Dan,

the script provided seems to not work on my ceph cluster :(
This is ceph version 0.80.3

I get empty results, on both debug level 10 and the maximum level of 20...

[root@cs1 ~]# ./rbd-io-stats.plhttp://rbd-io-stats.pl/ 
/var/log/ceph/ceph-osd.0.log-20140811.gz
Writes per OSD:
Writes per pool:
Writes per PG:
Writes per RBD:
Writes per object:
Writes per length:
.
.
.




On 8 August 2014 16:01, Dan Van Der Ster 
daniel.vanders...@cern.chmailto:daniel.vanders...@cern.ch wrote:
Hi,

On 08 Aug 2014, at 15:55, Andrija Panic 
andrija.pa...@gmail.commailto:andrija.pa...@gmail.com wrote:

Hi Dan,

thank you very much for the script, will check it out...no thortling so far, 
but I guess it will have to be done...

This seems to read only gziped logs?

Well it’s pretty simple, and it zcat’s each input file. So yes, only gz files 
in the current script. But you can change that pretty trivially ;)

so since read only I guess it is safe to run it on proudction cluster now… ?

I personally don’t do anything new on a Friday just before leaving ;)

But its just grepping the log files, so start with one, then two, then...

The script will also check for mulitply OSDs as far as I can understadn, not 
just osd.0 given in script comment ?


Yup, what I do is gather all of the OSD logs for a single day in a single 
directory (in CephFS ;), then run that script on all of the OSDs. It takes 
awhile, but it will give you the overall daily totals for the whole cluster.

If you are only trying to find the top users, then it is sufficient to check a 
subset of OSDs, since by their nature the client IOs are spread across most/all 
OSDs.

Cheers, Dan

Thanks a lot.
Andrija




On 8 August 2014 15:44, Dan Van Der Ster 
daniel.vanders...@cern.chmailto:daniel.vanders...@cern.ch wrote:
Hi,
Here’s what we do to identify our top RBD users.

First, enable log level 10 for the filestore so you can see all the IOs coming 
from the VMs. Then use a script like this (used on a dumpling cluster):

  https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl

to summarize the osd logs and identify the top clients.

Then its just a matter of scripting to figure out the ops/sec per volume, but 
for us at least the main use-case has been to identify who is responsible for a 
new peak in overall ops — and daily-granular statistics from the above script 
tends to suffice.

BTW, do you throttle your clients? We found that its absolutely necessary, 
since without a throttle just a few active VMs can eat up the entire iops 
capacity of the cluster.

Cheers, Dan

-- Dan van der Ster || Data  Storage Services || CERN IT Department --


On 08 Aug 2014, at 13:51, Andrija Panic 
andrija.pa...@gmail.commailto:andrija.pa...@gmail.com wrote:

Hi,

we just had some new clients, and have suffered very big degradation in CEPH 
performance for some reasons (we are using CloudStack).

I'm wondering if there is way to monitor OP/s or similar usage by client 
connected, so we can isolate the heavy client ?

Also, what is the general best practice to monitor these kind of changes in 
CEPH ? I'm talking about R/W or OP/s change or similar...

Thanks,
--

Andrija Panić

___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--

Andrija Panić
--
  http://admintweets.comhttp://admintweets.com/
--




--

Andrija Panić
--
  http://admintweets.comhttp://admintweets.com/
--



--

Andrija Panić
--
  http://admintweets.comhttp://admintweets.com/
--

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-11 Thread Andrija Panic
That's better :D

Thanks a lot, now I will be able to troubleshoot my problem :)

Thanks Dan,
Andrija


On 11 August 2014 13:21, Dan Van Der Ster daniel.vanders...@cern.ch wrote:

  Hi,
 I changed the script to be a bit more flexible with the osd path. Give
 this a try again:
 https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl
 Cheers, Dan

 -- Dan van der Ster || Data  Storage Services || CERN IT Department --


  On 11 Aug 2014, at 12:48, Andrija Panic andrija.pa...@gmail.com wrote:

  I appologize, clicked the Send button to fast...

  Anyway, I can see there are lines in log file:
 2014-08-11 12:43:25.477693 7f022d257700 10
 filestore(/var/lib/ceph/osd/ceph-0) write
 3.48_head/14b1ca48/rbd_data.41e16619f5eb6.1bd1/head//3
 3641344~4608 = 4608
  Not sure if I can do anything to fix this... ?

  Thanks,
 Andrija



 On 11 August 2014 12:46, Andrija Panic andrija.pa...@gmail.com wrote:

 Hi Dan,

  the script provided seems to not work on my ceph cluster :(
 This is ceph version 0.80.3

  I get empty results, on both debug level 10 and the maximum level of
 20...

  [root@cs1 ~]# ./rbd-io-stats.pl /var/log/ceph/ceph-osd.0.log-20140811.gz
 Writes per OSD:
 Writes per pool:
  Writes per PG:
  Writes per RBD:
  Writes per object:
  Writes per length:
  .
  .
 .




 On 8 August 2014 16:01, Dan Van Der Ster daniel.vanders...@cern.ch
 wrote:

 Hi,

  On 08 Aug 2014, at 15:55, Andrija Panic andrija.pa...@gmail.com
 wrote:

  Hi Dan,

  thank you very much for the script, will check it out...no thortling
 so far, but I guess it will have to be done...

  This seems to read only gziped logs?


  Well it’s pretty simple, and it zcat’s each input file. So yes, only
 gz files in the current script. But you can change that pretty trivially ;)

  so since read only I guess it is safe to run it on proudction cluster
 now… ?


  I personally don’t do anything new on a Friday just before leaving ;)

  But its just grepping the log files, so start with one, then two,
 then...

   The script will also check for mulitply OSDs as far as I can
 understadn, not just osd.0 given in script comment ?


  Yup, what I do is gather all of the OSD logs for a single day in a
 single directory (in CephFS ;), then run that script on all of the OSDs. It
 takes awhile, but it will give you the overall daily totals for the whole
 cluster.

  If you are only trying to find the top users, then it is sufficient to
 check a subset of OSDs, since by their nature the client IOs are spread
 across most/all OSDs.

  Cheers, Dan

  Thanks a lot.
 Andrija




 On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch
 wrote:

 Hi,
 Here’s what we do to identify our top RBD users.

  First, enable log level 10 for the filestore so you can see all the
 IOs coming from the VMs. Then use a script like this (used on a dumpling
 cluster):


 https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl

  to summarize the osd logs and identify the top clients.

  Then its just a matter of scripting to figure out the ops/sec per
 volume, but for us at least the main use-case has been to identify who is
 responsible for a new peak in overall ops — and daily-granular statistics
 from the above script tends to suffice.

  BTW, do you throttle your clients? We found that its absolutely
 necessary, since without a throttle just a few active VMs can eat up the
 entire iops capacity of the cluster.

  Cheers, Dan

 -- Dan van der Ster || Data  Storage Services || CERN IT Department --


   On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com
 wrote:

Hi,

  we just had some new clients, and have suffered very big degradation
 in CEPH performance for some reasons (we are using CloudStack).

  I'm wondering if there is way to monitor OP/s or similar usage by
 client connected, so we can isolate the heavy client ?

  Also, what is the general best practice to monitor these kind of
 changes in CEPH ? I'm talking about R/W or OP/s change or similar...

  Thanks,
 --

 Andrija Panić

___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





  --

 Andrija Panić
 --
   http://admintweets.com
 --





  --

 Andrija Panić
 --
   http://admintweets.com
 --




  --

 Andrija Panić
 --
   http://admintweets.com
 --





-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Wido den Hollander

On 08/08/2014 01:51 PM, Andrija Panic wrote:

Hi,

we just had some new clients, and have suffered very big degradation in
CEPH performance for some reasons (we are using CloudStack).

I'm wondering if there is way to monitor OP/s or similar usage by client
connected, so we can isolate the heavy client ?



This is not very easy to do with Ceph, but CloudStack keeps track of 
this in the usage database.


With never versions of CloudStack you can also limit the IOps of 
Instances to prevent such situations.



Also, what is the general best practice to monitor these kind of changes
in CEPH ? I'm talking about R/W or OP/s change or similar...

Thanks,
--

Andrija Panić



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Andrija Panic
Thanks Wido, yes I'm aware of CloudStack in that sense, but would prefer
some precise OP/s per ceph Image at least...
Will check CloudStack then...

Thx


On 8 August 2014 13:53, Wido den Hollander w...@42on.com wrote:

 On 08/08/2014 01:51 PM, Andrija Panic wrote:

 Hi,

 we just had some new clients, and have suffered very big degradation in
 CEPH performance for some reasons (we are using CloudStack).

 I'm wondering if there is way to monitor OP/s or similar usage by client
 connected, so we can isolate the heavy client ?


 This is not very easy to do with Ceph, but CloudStack keeps track of this
 in the usage database.

 With never versions of CloudStack you can also limit the IOps of Instances
 to prevent such situations.

  Also, what is the general best practice to monitor these kind of changes
 in CEPH ? I'm talking about R/W or OP/s change or similar...

 Thanks,
 --

 Andrija Panić



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Wido den Hollander

On 08/08/2014 02:02 PM, Andrija Panic wrote:

Thanks Wido, yes I'm aware of CloudStack in that sense, but would prefer
some precise OP/s per ceph Image at least...
Will check CloudStack then...



Ceph doesn't really know that since RBD is just a layer on top of RADOS. 
In the end the CloudStack hypervisors are doing I/O towards RADOS 
objects, so giving exact stats of how many IOps you are seeing per image 
is hard to figure out.


The hypervisor knows this best since it sees all the I/O going through.

Wido


Thx


On 8 August 2014 13:53, Wido den Hollander w...@42on.com
mailto:w...@42on.com wrote:

On 08/08/2014 01:51 PM, Andrija Panic wrote:

Hi,

we just had some new clients, and have suffered very big
degradation in
CEPH performance for some reasons (we are using CloudStack).

I'm wondering if there is way to monitor OP/s or similar usage
by client
connected, so we can isolate the heavy client ?


This is not very easy to do with Ceph, but CloudStack keeps track of
this in the usage database.

With never versions of CloudStack you can also limit the IOps of
Instances to prevent such situations.

Also, what is the general best practice to monitor these kind of
changes
in CEPH ? I'm talking about R/W or OP/s change or similar...

Thanks,
--

Andrija Panić



_
ceph-users mailing list
ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902 tel:%2B31%20%280%2920%20700%209902
Skype: contact42on
_
ceph-users mailing list
ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--

Andrija Panić
--
http://admintweets.com
--



--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Andrija Panic
Hm, true...
One final question, I might be a noob...
13923 B/s rd, 4744 kB/s wr, 1172 op/s
what does this op/s represent - is it classic IOps (4k reads/writes) or
something else ? how much is too much :)  - I'm familiar with SATA/SSD IO/s
specs/tests, etc, but not sure what CEPH menas by op/s - could not find
anything with google...

Thanks again Wido.
Andrija


On 8 August 2014 14:07, Wido den Hollander w...@42on.com wrote:

 On 08/08/2014 02:02 PM, Andrija Panic wrote:

 Thanks Wido, yes I'm aware of CloudStack in that sense, but would prefer
 some precise OP/s per ceph Image at least...
 Will check CloudStack then...


 Ceph doesn't really know that since RBD is just a layer on top of RADOS.
 In the end the CloudStack hypervisors are doing I/O towards RADOS objects,
 so giving exact stats of how many IOps you are seeing per image is hard to
 figure out.

 The hypervisor knows this best since it sees all the I/O going through.

 Wido

  Thx


 On 8 August 2014 13:53, Wido den Hollander w...@42on.com
 mailto:w...@42on.com wrote:

 On 08/08/2014 01:51 PM, Andrija Panic wrote:

 Hi,

 we just had some new clients, and have suffered very big
 degradation in
 CEPH performance for some reasons (we are using CloudStack).

 I'm wondering if there is way to monitor OP/s or similar usage
 by client
 connected, so we can isolate the heavy client ?


 This is not very easy to do with Ceph, but CloudStack keeps track of
 this in the usage database.

 With never versions of CloudStack you can also limit the IOps of
 Instances to prevent such situations.

 Also, what is the general best practice to monitor these kind of
 changes
 in CEPH ? I'm talking about R/W or OP/s change or similar...

 Thanks,
 --

 Andrija Panić



 _
 ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com

 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902 tel:%2B31%20%280%2920%20700%209902
 Skype: contact42on
 _
 ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com

 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --

 Andrija Panić
 --
 http://admintweets.com
 --



 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Dan Van Der Ster
Hi,
Here’s what we do to identify our top RBD users.

First, enable log level 10 for the filestore so you can see all the IOs coming 
from the VMs. Then use a script like this (used on a dumpling cluster):

  https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl

to summarize the osd logs and identify the top clients.

Then its just a matter of scripting to figure out the ops/sec per volume, but 
for us at least the main use-case has been to identify who is responsible for a 
new peak in overall ops — and daily-granular statistics from the above script 
tends to suffice.

BTW, do you throttle your clients? We found that its absolutely necessary, 
since without a throttle just a few active VMs can eat up the entire iops 
capacity of the cluster.

Cheers, Dan

-- Dan van der Ster || Data  Storage Services || CERN IT Department --


On 08 Aug 2014, at 13:51, Andrija Panic 
andrija.pa...@gmail.commailto:andrija.pa...@gmail.com wrote:

Hi,

we just had some new clients, and have suffered very big degradation in CEPH 
performance for some reasons (we are using CloudStack).

I'm wondering if there is way to monitor OP/s or similar usage by client 
connected, so we can isolate the heavy client ?

Also, what is the general best practice to monitor these kind of changes in 
CEPH ? I'm talking about R/W or OP/s change or similar...

Thanks,
--

Andrija Panić

___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Andrija Panic
Hi Dan,

thank you very much for the script, will check it out...no thortling so
far, but I guess it will have to be done...

This seems to read only gziped logs? so since read only I guess it is safe
to run it on proudction cluster now... ?
The script will also check for mulitply OSDs as far as I can understadn,
not just osd.0 given in script comment ?

Thanks a lot.
Andrija




On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch wrote:

  Hi,
 Here’s what we do to identify our top RBD users.

  First, enable log level 10 for the filestore so you can see all the IOs
 coming from the VMs. Then use a script like this (used on a dumpling
 cluster):


 https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl

  to summarize the osd logs and identify the top clients.

  Then its just a matter of scripting to figure out the ops/sec per
 volume, but for us at least the main use-case has been to identify who is
 responsible for a new peak in overall ops — and daily-granular statistics
 from the above script tends to suffice.

  BTW, do you throttle your clients? We found that its absolutely
 necessary, since without a throttle just a few active VMs can eat up the
 entire iops capacity of the cluster.

  Cheers, Dan

 -- Dan van der Ster || Data  Storage Services || CERN IT Department --


  On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com wrote:

  Hi,

  we just had some new clients, and have suffered very big degradation in
 CEPH performance for some reasons (we are using CloudStack).

  I'm wondering if there is way to monitor OP/s or similar usage by client
 connected, so we can isolate the heavy client ?

  Also, what is the general best practice to monitor these kind of changes
 in CEPH ? I'm talking about R/W or OP/s change or similar...

  Thanks,
 --

 Andrija Panić

   ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Dan Van Der Ster
Hi,

On 08 Aug 2014, at 15:55, Andrija Panic 
andrija.pa...@gmail.commailto:andrija.pa...@gmail.com wrote:

Hi Dan,

thank you very much for the script, will check it out...no thortling so far, 
but I guess it will have to be done...

This seems to read only gziped logs?

Well it’s pretty simple, and it zcat’s each input file. So yes, only gz files 
in the current script. But you can change that pretty trivially ;)

so since read only I guess it is safe to run it on proudction cluster now… ?

I personally don’t do anything new on a Friday just before leaving ;)

But its just grepping the log files, so start with one, then two, then...

The script will also check for mulitply OSDs as far as I can understadn, not 
just osd.0 given in script comment ?


Yup, what I do is gather all of the OSD logs for a single day in a single 
directory (in CephFS ;), then run that script on all of the OSDs. It takes 
awhile, but it will give you the overall daily totals for the whole cluster.

If you are only trying to find the top users, then it is sufficient to check a 
subset of OSDs, since by their nature the client IOs are spread across most/all 
OSDs.

Cheers, Dan

Thanks a lot.
Andrija




On 8 August 2014 15:44, Dan Van Der Ster 
daniel.vanders...@cern.chmailto:daniel.vanders...@cern.ch wrote:
Hi,
Here’s what we do to identify our top RBD users.

First, enable log level 10 for the filestore so you can see all the IOs coming 
from the VMs. Then use a script like this (used on a dumpling cluster):

  https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl

to summarize the osd logs and identify the top clients.

Then its just a matter of scripting to figure out the ops/sec per volume, but 
for us at least the main use-case has been to identify who is responsible for a 
new peak in overall ops — and daily-granular statistics from the above script 
tends to suffice.

BTW, do you throttle your clients? We found that its absolutely necessary, 
since without a throttle just a few active VMs can eat up the entire iops 
capacity of the cluster.

Cheers, Dan

-- Dan van der Ster || Data  Storage Services || CERN IT Department --


On 08 Aug 2014, at 13:51, Andrija Panic 
andrija.pa...@gmail.commailto:andrija.pa...@gmail.com wrote:

Hi,

we just had some new clients, and have suffered very big degradation in CEPH 
performance for some reasons (we are using CloudStack).

I'm wondering if there is way to monitor OP/s or similar usage by client 
connected, so we can isolate the heavy client ?

Also, what is the general best practice to monitor these kind of changes in 
CEPH ? I'm talking about R/W or OP/s change or similar...

Thanks,
--

Andrija Panić

___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--

Andrija Panić
--
  http://admintweets.comhttp://admintweets.com/
--

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Andrija Panic
Thanks again, and btw, beside being Friday I'm also on vacation - so double
the joy of troubleshooting performance problmes :)))

Thx :)


On 8 August 2014 16:01, Dan Van Der Ster daniel.vanders...@cern.ch wrote:

  Hi,

  On 08 Aug 2014, at 15:55, Andrija Panic andrija.pa...@gmail.com wrote:

  Hi Dan,

  thank you very much for the script, will check it out...no thortling so
 far, but I guess it will have to be done...

  This seems to read only gziped logs?


  Well it’s pretty simple, and it zcat’s each input file. So yes, only gz
 files in the current script. But you can change that pretty trivially ;)

  so since read only I guess it is safe to run it on proudction cluster
 now… ?


  I personally don’t do anything new on a Friday just before leaving ;)

  But its just grepping the log files, so start with one, then two, then...

   The script will also check for mulitply OSDs as far as I can
 understadn, not just osd.0 given in script comment ?


  Yup, what I do is gather all of the OSD logs for a single day in a
 single directory (in CephFS ;), then run that script on all of the OSDs. It
 takes awhile, but it will give you the overall daily totals for the whole
 cluster.

  If you are only trying to find the top users, then it is sufficient to
 check a subset of OSDs, since by their nature the client IOs are spread
 across most/all OSDs.

  Cheers, Dan

  Thanks a lot.
 Andrija




 On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch
 wrote:

 Hi,
 Here’s what we do to identify our top RBD users.

  First, enable log level 10 for the filestore so you can see all the IOs
 coming from the VMs. Then use a script like this (used on a dumpling
 cluster):


 https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl

  to summarize the osd logs and identify the top clients.

  Then its just a matter of scripting to figure out the ops/sec per
 volume, but for us at least the main use-case has been to identify who is
 responsible for a new peak in overall ops — and daily-granular statistics
 from the above script tends to suffice.

  BTW, do you throttle your clients? We found that its absolutely
 necessary, since without a throttle just a few active VMs can eat up the
 entire iops capacity of the cluster.

  Cheers, Dan

 -- Dan van der Ster || Data  Storage Services || CERN IT Department --


   On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com
 wrote:

Hi,

  we just had some new clients, and have suffered very big degradation in
 CEPH performance for some reasons (we are using CloudStack).

  I'm wondering if there is way to monitor OP/s or similar usage by
 client connected, so we can isolate the heavy client ?

  Also, what is the general best practice to monitor these kind of
 changes in CEPH ? I'm talking about R/W or OP/s change or similar...

  Thanks,
 --

 Andrija Panić

___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





  --

 Andrija Panić
 --
   http://admintweets.com
 --





-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Wido den Hollander

On 08/08/2014 03:44 PM, Dan Van Der Ster wrote:

Hi,
Here’s what we do to identify our top RBD users.

First, enable log level 10 for the filestore so you can see all the IOs
coming from the VMs. Then use a script like this (used on a dumpling
cluster):

https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl

to summarize the osd logs and identify the top clients.

Then its just a matter of scripting to figure out the ops/sec per
volume, but for us at least the main use-case has been to identify who
is responsible for a new peak in overall ops — and daily-granular
statistics from the above script tends to suffice.

BTW, do you throttle your clients? We found that its absolutely
necessary, since without a throttle just a few active VMs can eat up the
entire iops capacity of the cluster.


+1

I'd strongly advise to set I/O limits for Instances. I've had multiple 
occasions where a runaway script inside a VM was hammering on the 
underlying storage killing all I/O.


Not only with Ceph, but over the many years I've worked with storage. 
I/O == expensive


CloudStack supports I/O limiting, so I recommend you set a limit. Set it 
to 750 write IOps for example. That way one Instance can't kill the 
whole cluster, but it still has enough I/O to run. (usually).


Wido



Cheers, Dan

-- Dan van der Ster || Data  Storage Services || CERN IT Department --


On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com
mailto:andrija.pa...@gmail.com wrote:


Hi,

we just had some new clients, and have suffered very big degradation
in CEPH performance for some reasons (we are using CloudStack).

I'm wondering if there is way to monitor OP/s or similar usage by
client connected, so we can isolate the heavy client ?

Also, what is the general best practice to monitor these kind of
changes in CEPH ? I'm talking about R/W or OP/s change or similar...

Thanks,
--

Andrija Panić

___
ceph-users mailing list
ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Andrija Panic
Will do so definitively, thanks Wido and Dan...
Cheers guys


On 8 August 2014 16:13, Wido den Hollander w...@42on.com wrote:

 On 08/08/2014 03:44 PM, Dan Van Der Ster wrote:

 Hi,
 Here’s what we do to identify our top RBD users.

 First, enable log level 10 for the filestore so you can see all the IOs
 coming from the VMs. Then use a script like this (used on a dumpling
 cluster):

 https://github.com/cernceph/ceph-scripts/blob/master/
 tools/rbd-io-stats.pl

 to summarize the osd logs and identify the top clients.

 Then its just a matter of scripting to figure out the ops/sec per
 volume, but for us at least the main use-case has been to identify who
 is responsible for a new peak in overall ops — and daily-granular
 statistics from the above script tends to suffice.

 BTW, do you throttle your clients? We found that its absolutely
 necessary, since without a throttle just a few active VMs can eat up the
 entire iops capacity of the cluster.


 +1

 I'd strongly advise to set I/O limits for Instances. I've had multiple
 occasions where a runaway script inside a VM was hammering on the
 underlying storage killing all I/O.

 Not only with Ceph, but over the many years I've worked with storage. I/O
 == expensive

 CloudStack supports I/O limiting, so I recommend you set a limit. Set it
 to 750 write IOps for example. That way one Instance can't kill the whole
 cluster, but it still has enough I/O to run. (usually).

 Wido


 Cheers, Dan

 -- Dan van der Ster || Data  Storage Services || CERN IT Department --


 On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com
 mailto:andrija.pa...@gmail.com wrote:

  Hi,

 we just had some new clients, and have suffered very big degradation
 in CEPH performance for some reasons (we are using CloudStack).

 I'm wondering if there is way to monitor OP/s or similar usage by
 client connected, so we can isolate the heavy client ?

 Also, what is the general best practice to monitor these kind of
 changes in CEPH ? I'm talking about R/W or OP/s change or similar...

 Thanks,
 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com