Re: [ceph-users] Show IOps per VM/client to find heavy users...
Hi Dan, the script provided seems to not work on my ceph cluster :( This is ceph version 0.80.3 I get empty results, on both debug level 10 and the maximum level of 20... [root@cs1 ~]# ./rbd-io-stats.pl /var/log/ceph/ceph-osd.0.log-20140811.gz Writes per OSD: Writes per pool: Writes per PG: Writes per RBD: Writes per object: Writes per length: . . . On 8 August 2014 16:01, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, On 08 Aug 2014, at 15:55, Andrija Panic andrija.pa...@gmail.com wrote: Hi Dan, thank you very much for the script, will check it out...no thortling so far, but I guess it will have to be done... This seems to read only gziped logs? Well it’s pretty simple, and it zcat’s each input file. So yes, only gz files in the current script. But you can change that pretty trivially ;) so since read only I guess it is safe to run it on proudction cluster now… ? I personally don’t do anything new on a Friday just before leaving ;) But its just grepping the log files, so start with one, then two, then... The script will also check for mulitply OSDs as far as I can understadn, not just osd.0 given in script comment ? Yup, what I do is gather all of the OSD logs for a single day in a single directory (in CephFS ;), then run that script on all of the OSDs. It takes awhile, but it will give you the overall daily totals for the whole cluster. If you are only trying to find the top users, then it is sufficient to check a subset of OSDs, since by their nature the client IOs are spread across most/all OSDs. Cheers, Dan Thanks a lot. Andrija On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, Here’s what we do to identify our top RBD users. First, enable log level 10 for the filestore so you can see all the IOs coming from the VMs. Then use a script like this (used on a dumpling cluster): https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl to summarize the osd logs and identify the top clients. Then its just a matter of scripting to figure out the ops/sec per volume, but for us at least the main use-case has been to identify who is responsible for a new peak in overall ops — and daily-granular statistics from the above script tends to suffice. BTW, do you throttle your clients? We found that its absolutely necessary, since without a throttle just a few active VMs can eat up the entire iops capacity of the cluster. Cheers, Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
I appologize, clicked the Send button to fast... Anyway, I can see there are lines in log file: 2014-08-11 12:43:25.477693 7f022d257700 10 filestore(/var/lib/ceph/osd/ceph-0) write 3.48_head/14b1ca48/rbd_data.41e16619f5eb6.1bd1/head//3 3641344~4608 = 4608 Not sure if I can do anything to fix this... ? Thanks, Andrija On 11 August 2014 12:46, Andrija Panic andrija.pa...@gmail.com wrote: Hi Dan, the script provided seems to not work on my ceph cluster :( This is ceph version 0.80.3 I get empty results, on both debug level 10 and the maximum level of 20... [root@cs1 ~]# ./rbd-io-stats.pl /var/log/ceph/ceph-osd.0.log-20140811.gz Writes per OSD: Writes per pool: Writes per PG: Writes per RBD: Writes per object: Writes per length: . . . On 8 August 2014 16:01, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, On 08 Aug 2014, at 15:55, Andrija Panic andrija.pa...@gmail.com wrote: Hi Dan, thank you very much for the script, will check it out...no thortling so far, but I guess it will have to be done... This seems to read only gziped logs? Well it’s pretty simple, and it zcat’s each input file. So yes, only gz files in the current script. But you can change that pretty trivially ;) so since read only I guess it is safe to run it on proudction cluster now… ? I personally don’t do anything new on a Friday just before leaving ;) But its just grepping the log files, so start with one, then two, then... The script will also check for mulitply OSDs as far as I can understadn, not just osd.0 given in script comment ? Yup, what I do is gather all of the OSD logs for a single day in a single directory (in CephFS ;), then run that script on all of the OSDs. It takes awhile, but it will give you the overall daily totals for the whole cluster. If you are only trying to find the top users, then it is sufficient to check a subset of OSDs, since by their nature the client IOs are spread across most/all OSDs. Cheers, Dan Thanks a lot. Andrija On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, Here’s what we do to identify our top RBD users. First, enable log level 10 for the filestore so you can see all the IOs coming from the VMs. Then use a script like this (used on a dumpling cluster): https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl to summarize the osd logs and identify the top clients. Then its just a matter of scripting to figure out the ops/sec per volume, but for us at least the main use-case has been to identify who is responsible for a new peak in overall ops — and daily-granular statistics from the above script tends to suffice. BTW, do you throttle your clients? We found that its absolutely necessary, since without a throttle just a few active VMs can eat up the entire iops capacity of the cluster. Cheers, Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- -- Andrija Panić -- http://admintweets.com -- -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
Hi, I changed the script to be a bit more flexible with the osd path. Give this a try again: https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl Cheers, Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- On 11 Aug 2014, at 12:48, Andrija Panic andrija.pa...@gmail.commailto:andrija.pa...@gmail.com wrote: I appologize, clicked the Send button to fast... Anyway, I can see there are lines in log file: 2014-08-11 12:43:25.477693 7f022d257700 10 filestore(/var/lib/ceph/osd/ceph-0) write 3.48_head/14b1ca48/rbd_data.41e16619f5eb6.1bd1/head//3 3641344~4608 = 4608 Not sure if I can do anything to fix this... ? Thanks, Andrija On 11 August 2014 12:46, Andrija Panic andrija.pa...@gmail.commailto:andrija.pa...@gmail.com wrote: Hi Dan, the script provided seems to not work on my ceph cluster :( This is ceph version 0.80.3 I get empty results, on both debug level 10 and the maximum level of 20... [root@cs1 ~]# ./rbd-io-stats.plhttp://rbd-io-stats.pl/ /var/log/ceph/ceph-osd.0.log-20140811.gz Writes per OSD: Writes per pool: Writes per PG: Writes per RBD: Writes per object: Writes per length: . . . On 8 August 2014 16:01, Dan Van Der Ster daniel.vanders...@cern.chmailto:daniel.vanders...@cern.ch wrote: Hi, On 08 Aug 2014, at 15:55, Andrija Panic andrija.pa...@gmail.commailto:andrija.pa...@gmail.com wrote: Hi Dan, thank you very much for the script, will check it out...no thortling so far, but I guess it will have to be done... This seems to read only gziped logs? Well it’s pretty simple, and it zcat’s each input file. So yes, only gz files in the current script. But you can change that pretty trivially ;) so since read only I guess it is safe to run it on proudction cluster now… ? I personally don’t do anything new on a Friday just before leaving ;) But its just grepping the log files, so start with one, then two, then... The script will also check for mulitply OSDs as far as I can understadn, not just osd.0 given in script comment ? Yup, what I do is gather all of the OSD logs for a single day in a single directory (in CephFS ;), then run that script on all of the OSDs. It takes awhile, but it will give you the overall daily totals for the whole cluster. If you are only trying to find the top users, then it is sufficient to check a subset of OSDs, since by their nature the client IOs are spread across most/all OSDs. Cheers, Dan Thanks a lot. Andrija On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.chmailto:daniel.vanders...@cern.ch wrote: Hi, Here’s what we do to identify our top RBD users. First, enable log level 10 for the filestore so you can see all the IOs coming from the VMs. Then use a script like this (used on a dumpling cluster): https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl to summarize the osd logs and identify the top clients. Then its just a matter of scripting to figure out the ops/sec per volume, but for us at least the main use-case has been to identify who is responsible for a new peak in overall ops — and daily-granular statistics from the above script tends to suffice. BTW, do you throttle your clients? We found that its absolutely necessary, since without a throttle just a few active VMs can eat up the entire iops capacity of the cluster. Cheers, Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.commailto:andrija.pa...@gmail.com wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.comhttp://admintweets.com/ -- -- Andrija Panić -- http://admintweets.comhttp://admintweets.com/ -- -- Andrija Panić -- http://admintweets.comhttp://admintweets.com/ -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
That's better :D Thanks a lot, now I will be able to troubleshoot my problem :) Thanks Dan, Andrija On 11 August 2014 13:21, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, I changed the script to be a bit more flexible with the osd path. Give this a try again: https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl Cheers, Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- On 11 Aug 2014, at 12:48, Andrija Panic andrija.pa...@gmail.com wrote: I appologize, clicked the Send button to fast... Anyway, I can see there are lines in log file: 2014-08-11 12:43:25.477693 7f022d257700 10 filestore(/var/lib/ceph/osd/ceph-0) write 3.48_head/14b1ca48/rbd_data.41e16619f5eb6.1bd1/head//3 3641344~4608 = 4608 Not sure if I can do anything to fix this... ? Thanks, Andrija On 11 August 2014 12:46, Andrija Panic andrija.pa...@gmail.com wrote: Hi Dan, the script provided seems to not work on my ceph cluster :( This is ceph version 0.80.3 I get empty results, on both debug level 10 and the maximum level of 20... [root@cs1 ~]# ./rbd-io-stats.pl /var/log/ceph/ceph-osd.0.log-20140811.gz Writes per OSD: Writes per pool: Writes per PG: Writes per RBD: Writes per object: Writes per length: . . . On 8 August 2014 16:01, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, On 08 Aug 2014, at 15:55, Andrija Panic andrija.pa...@gmail.com wrote: Hi Dan, thank you very much for the script, will check it out...no thortling so far, but I guess it will have to be done... This seems to read only gziped logs? Well it’s pretty simple, and it zcat’s each input file. So yes, only gz files in the current script. But you can change that pretty trivially ;) so since read only I guess it is safe to run it on proudction cluster now… ? I personally don’t do anything new on a Friday just before leaving ;) But its just grepping the log files, so start with one, then two, then... The script will also check for mulitply OSDs as far as I can understadn, not just osd.0 given in script comment ? Yup, what I do is gather all of the OSD logs for a single day in a single directory (in CephFS ;), then run that script on all of the OSDs. It takes awhile, but it will give you the overall daily totals for the whole cluster. If you are only trying to find the top users, then it is sufficient to check a subset of OSDs, since by their nature the client IOs are spread across most/all OSDs. Cheers, Dan Thanks a lot. Andrija On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, Here’s what we do to identify our top RBD users. First, enable log level 10 for the filestore so you can see all the IOs coming from the VMs. Then use a script like this (used on a dumpling cluster): https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl to summarize the osd logs and identify the top clients. Then its just a matter of scripting to figure out the ops/sec per volume, but for us at least the main use-case has been to identify who is responsible for a new peak in overall ops — and daily-granular statistics from the above script tends to suffice. BTW, do you throttle your clients? We found that its absolutely necessary, since without a throttle just a few active VMs can eat up the entire iops capacity of the cluster. Cheers, Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- -- Andrija Panić -- http://admintweets.com -- -- Andrija Panić -- http://admintweets.com -- -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
On 08/08/2014 01:51 PM, Andrija Panic wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? This is not very easy to do with Ceph, but CloudStack keeps track of this in the usage database. With never versions of CloudStack you can also limit the IOps of Instances to prevent such situations. Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
Thanks Wido, yes I'm aware of CloudStack in that sense, but would prefer some precise OP/s per ceph Image at least... Will check CloudStack then... Thx On 8 August 2014 13:53, Wido den Hollander w...@42on.com wrote: On 08/08/2014 01:51 PM, Andrija Panic wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? This is not very easy to do with Ceph, but CloudStack keeps track of this in the usage database. With never versions of CloudStack you can also limit the IOps of Instances to prevent such situations. Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
On 08/08/2014 02:02 PM, Andrija Panic wrote: Thanks Wido, yes I'm aware of CloudStack in that sense, but would prefer some precise OP/s per ceph Image at least... Will check CloudStack then... Ceph doesn't really know that since RBD is just a layer on top of RADOS. In the end the CloudStack hypervisors are doing I/O towards RADOS objects, so giving exact stats of how many IOps you are seeing per image is hard to figure out. The hypervisor knows this best since it sees all the I/O going through. Wido Thx On 8 August 2014 13:53, Wido den Hollander w...@42on.com mailto:w...@42on.com wrote: On 08/08/2014 01:51 PM, Andrija Panic wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? This is not very easy to do with Ceph, but CloudStack keeps track of this in the usage database. With never versions of CloudStack you can also limit the IOps of Instances to prevent such situations. Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 tel:%2B31%20%280%2920%20700%209902 Skype: contact42on _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
Hm, true... One final question, I might be a noob... 13923 B/s rd, 4744 kB/s wr, 1172 op/s what does this op/s represent - is it classic IOps (4k reads/writes) or something else ? how much is too much :) - I'm familiar with SATA/SSD IO/s specs/tests, etc, but not sure what CEPH menas by op/s - could not find anything with google... Thanks again Wido. Andrija On 8 August 2014 14:07, Wido den Hollander w...@42on.com wrote: On 08/08/2014 02:02 PM, Andrija Panic wrote: Thanks Wido, yes I'm aware of CloudStack in that sense, but would prefer some precise OP/s per ceph Image at least... Will check CloudStack then... Ceph doesn't really know that since RBD is just a layer on top of RADOS. In the end the CloudStack hypervisors are doing I/O towards RADOS objects, so giving exact stats of how many IOps you are seeing per image is hard to figure out. The hypervisor knows this best since it sees all the I/O going through. Wido Thx On 8 August 2014 13:53, Wido den Hollander w...@42on.com mailto:w...@42on.com wrote: On 08/08/2014 01:51 PM, Andrija Panic wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? This is not very easy to do with Ceph, but CloudStack keeps track of this in the usage database. With never versions of CloudStack you can also limit the IOps of Instances to prevent such situations. Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 tel:%2B31%20%280%2920%20700%209902 Skype: contact42on _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
Hi, Here’s what we do to identify our top RBD users. First, enable log level 10 for the filestore so you can see all the IOs coming from the VMs. Then use a script like this (used on a dumpling cluster): https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl to summarize the osd logs and identify the top clients. Then its just a matter of scripting to figure out the ops/sec per volume, but for us at least the main use-case has been to identify who is responsible for a new peak in overall ops — and daily-granular statistics from the above script tends to suffice. BTW, do you throttle your clients? We found that its absolutely necessary, since without a throttle just a few active VMs can eat up the entire iops capacity of the cluster. Cheers, Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.commailto:andrija.pa...@gmail.com wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
Hi Dan, thank you very much for the script, will check it out...no thortling so far, but I guess it will have to be done... This seems to read only gziped logs? so since read only I guess it is safe to run it on proudction cluster now... ? The script will also check for mulitply OSDs as far as I can understadn, not just osd.0 given in script comment ? Thanks a lot. Andrija On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, Here’s what we do to identify our top RBD users. First, enable log level 10 for the filestore so you can see all the IOs coming from the VMs. Then use a script like this (used on a dumpling cluster): https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl to summarize the osd logs and identify the top clients. Then its just a matter of scripting to figure out the ops/sec per volume, but for us at least the main use-case has been to identify who is responsible for a new peak in overall ops — and daily-granular statistics from the above script tends to suffice. BTW, do you throttle your clients? We found that its absolutely necessary, since without a throttle just a few active VMs can eat up the entire iops capacity of the cluster. Cheers, Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
Hi, On 08 Aug 2014, at 15:55, Andrija Panic andrija.pa...@gmail.commailto:andrija.pa...@gmail.com wrote: Hi Dan, thank you very much for the script, will check it out...no thortling so far, but I guess it will have to be done... This seems to read only gziped logs? Well it’s pretty simple, and it zcat’s each input file. So yes, only gz files in the current script. But you can change that pretty trivially ;) so since read only I guess it is safe to run it on proudction cluster now… ? I personally don’t do anything new on a Friday just before leaving ;) But its just grepping the log files, so start with one, then two, then... The script will also check for mulitply OSDs as far as I can understadn, not just osd.0 given in script comment ? Yup, what I do is gather all of the OSD logs for a single day in a single directory (in CephFS ;), then run that script on all of the OSDs. It takes awhile, but it will give you the overall daily totals for the whole cluster. If you are only trying to find the top users, then it is sufficient to check a subset of OSDs, since by their nature the client IOs are spread across most/all OSDs. Cheers, Dan Thanks a lot. Andrija On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.chmailto:daniel.vanders...@cern.ch wrote: Hi, Here’s what we do to identify our top RBD users. First, enable log level 10 for the filestore so you can see all the IOs coming from the VMs. Then use a script like this (used on a dumpling cluster): https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl to summarize the osd logs and identify the top clients. Then its just a matter of scripting to figure out the ops/sec per volume, but for us at least the main use-case has been to identify who is responsible for a new peak in overall ops — and daily-granular statistics from the above script tends to suffice. BTW, do you throttle your clients? We found that its absolutely necessary, since without a throttle just a few active VMs can eat up the entire iops capacity of the cluster. Cheers, Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.commailto:andrija.pa...@gmail.com wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.comhttp://admintweets.com/ -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
Thanks again, and btw, beside being Friday I'm also on vacation - so double the joy of troubleshooting performance problmes :))) Thx :) On 8 August 2014 16:01, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, On 08 Aug 2014, at 15:55, Andrija Panic andrija.pa...@gmail.com wrote: Hi Dan, thank you very much for the script, will check it out...no thortling so far, but I guess it will have to be done... This seems to read only gziped logs? Well it’s pretty simple, and it zcat’s each input file. So yes, only gz files in the current script. But you can change that pretty trivially ;) so since read only I guess it is safe to run it on proudction cluster now… ? I personally don’t do anything new on a Friday just before leaving ;) But its just grepping the log files, so start with one, then two, then... The script will also check for mulitply OSDs as far as I can understadn, not just osd.0 given in script comment ? Yup, what I do is gather all of the OSD logs for a single day in a single directory (in CephFS ;), then run that script on all of the OSDs. It takes awhile, but it will give you the overall daily totals for the whole cluster. If you are only trying to find the top users, then it is sufficient to check a subset of OSDs, since by their nature the client IOs are spread across most/all OSDs. Cheers, Dan Thanks a lot. Andrija On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, Here’s what we do to identify our top RBD users. First, enable log level 10 for the filestore so you can see all the IOs coming from the VMs. Then use a script like this (used on a dumpling cluster): https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl to summarize the osd logs and identify the top clients. Then its just a matter of scripting to figure out the ops/sec per volume, but for us at least the main use-case has been to identify who is responsible for a new peak in overall ops — and daily-granular statistics from the above script tends to suffice. BTW, do you throttle your clients? We found that its absolutely necessary, since without a throttle just a few active VMs can eat up the entire iops capacity of the cluster. Cheers, Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
On 08/08/2014 03:44 PM, Dan Van Der Ster wrote: Hi, Here’s what we do to identify our top RBD users. First, enable log level 10 for the filestore so you can see all the IOs coming from the VMs. Then use a script like this (used on a dumpling cluster): https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl to summarize the osd logs and identify the top clients. Then its just a matter of scripting to figure out the ops/sec per volume, but for us at least the main use-case has been to identify who is responsible for a new peak in overall ops — and daily-granular statistics from the above script tends to suffice. BTW, do you throttle your clients? We found that its absolutely necessary, since without a throttle just a few active VMs can eat up the entire iops capacity of the cluster. +1 I'd strongly advise to set I/O limits for Instances. I've had multiple occasions where a runaway script inside a VM was hammering on the underlying storage killing all I/O. Not only with Ceph, but over the many years I've worked with storage. I/O == expensive CloudStack supports I/O limiting, so I recommend you set a limit. Set it to 750 write IOps for example. That way one Instance can't kill the whole cluster, but it still has enough I/O to run. (usually). Wido Cheers, Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com mailto:andrija.pa...@gmail.com wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
Will do so definitively, thanks Wido and Dan... Cheers guys On 8 August 2014 16:13, Wido den Hollander w...@42on.com wrote: On 08/08/2014 03:44 PM, Dan Van Der Ster wrote: Hi, Here’s what we do to identify our top RBD users. First, enable log level 10 for the filestore so you can see all the IOs coming from the VMs. Then use a script like this (used on a dumpling cluster): https://github.com/cernceph/ceph-scripts/blob/master/ tools/rbd-io-stats.pl to summarize the osd logs and identify the top clients. Then its just a matter of scripting to figure out the ops/sec per volume, but for us at least the main use-case has been to identify who is responsible for a new peak in overall ops — and daily-granular statistics from the above script tends to suffice. BTW, do you throttle your clients? We found that its absolutely necessary, since without a throttle just a few active VMs can eat up the entire iops capacity of the cluster. +1 I'd strongly advise to set I/O limits for Instances. I've had multiple occasions where a runaway script inside a VM was hammering on the underlying storage killing all I/O. Not only with Ceph, but over the many years I've worked with storage. I/O == expensive CloudStack supports I/O limiting, so I recommend you set a limit. Set it to 750 write IOps for example. That way one Instance can't kill the whole cluster, but it still has enough I/O to run. (usually). Wido Cheers, Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com mailto:andrija.pa...@gmail.com wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com