subject:"\[ceph\-users\] rbd snap rm overload my cluster \(during backups\)"

Re: [ceph-users] rbd snap rm overload my cluster (during backups)

2013-04-25 Thread Andrey Korolyov

Yep, 0.60 do all snapshot-related things way faster and it is
obviously faster on r/w with small blocks - comparing to 0.56.4 on
same disk commit percentage I may say ten times faster on average
request in-flight time.

On Mon, Apr 22, 2013 at 7:37 PM, Andrey Korolyov and...@xdel.ru wrote:
 Mentioned cluster is in production, but I can compare number of slow
 reqs on the test with different versions and report it.

 On Mon, Apr 22, 2013 at 7:24 PM, Sage Weil s...@inktank.com wrote:
 On Mon, 22 Apr 2013, Andrey Korolyov wrote:
 On Mon, Apr 22, 2013 at 7:10 PM, Sage Weil s...@inktank.com wrote:
  What version of Ceph are you running?
 
  sage

 0.56.4 with couple of backports from bobtail, but I`m not sure if
 version matters - the same behaviour was around from early 0.5x
 releases.

 I ask because the snapshot trimming was completely rewritten in 0.58 or
 0.59.  Is possible to test the latest on this cluster, or is it in
 production?

 sage


 
  On Mon, 22 Apr 2013, Andrey Korolyov wrote:
 
  I had  observed that the slow requests up to 10-20 seconds on writes
  may be produced immediately after creation or deletion of a snapshot
  of relatively large image, despite that image may be entirely unused
  at the moment.
 
  On Sun, Apr 21, 2013 at 7:44 PM, Gregory Farnum g...@inktank.com wrote:
   Which version of Ceph are you running right now and seeing this with
   (Sam reworked it a bit for Cuttlefish and it was in some of the dev
   releases)? Snapshot deletes are a little more expensive than we'd
   like, but I'm surprised they're doing this badly for you. :/
   -Greg
   Software Engineer #42 @ http://inktank.com | http://ceph.com
  
   On Sun, Apr 21, 2013 at 2:16 AM, Olivier Bonvalet
   olivier.bonva...@daevel.fr wrote:
   Hi,
  
   I have a backup script, which every night :
   * create a snapshot of each RBD image
   * then delete all snapshot that have more than 15 days
  
   The problem is that rbd snap rm XXX will overload my cluster for 
   hours
   (6 hours today...).
  
   Here I see several problems :
   #1 rbd snap rm XXX is not blocking. The erase is done in background,
   and I know no way to verify if it was completed. So I add sleeps
   between rm, but I have to estimate the time it will take
  
   #2 rbd (snap) rm are sometimes very very slow. I don't know if it's
   because of XFS or not, but all my OSD are at 100% IO usage (reported 
   by
   iostat)
  
  
  
   So :
   * is there a way to reduce priority of snap rm, to avoid overloading
   of the cluster ?
   * is there a way to have a blocking snap rm which will wait until 
   it's
   completed
   * is there a way to speedup snap rm ?
  
  
   Note that I have a too low PG number on my cluster (200 PG for 40 
   active
   OSD ; but I'm trying to progressivly migrate data to a newer pool). 
   Can
   it be the source of the problem ?
  
   Thanks,
  
   Olivier
  
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd snap rm overload my cluster (during backups)

2013-04-22 Thread Andrey Korolyov

I had  observed that the slow requests up to 10-20 seconds on writes
may be produced immediately after creation or deletion of a snapshot
of relatively large image, despite that image may be entirely unused
at the moment.

On Sun, Apr 21, 2013 at 7:44 PM, Gregory Farnum g...@inktank.com wrote:
 Which version of Ceph are you running right now and seeing this with
 (Sam reworked it a bit for Cuttlefish and it was in some of the dev
 releases)? Snapshot deletes are a little more expensive than we'd
 like, but I'm surprised they're doing this badly for you. :/
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com

 On Sun, Apr 21, 2013 at 2:16 AM, Olivier Bonvalet
 olivier.bonva...@daevel.fr wrote:
 Hi,

 I have a backup script, which every night :
 * create a snapshot of each RBD image
 * then delete all snapshot that have more than 15 days

 The problem is that rbd snap rm XXX will overload my cluster for hours
 (6 hours today...).

 Here I see several problems :
 #1 rbd snap rm XXX is not blocking. The erase is done in background,
 and I know no way to verify if it was completed. So I add sleeps
 between rm, but I have to estimate the time it will take

 #2 rbd (snap) rm are sometimes very very slow. I don't know if it's
 because of XFS or not, but all my OSD are at 100% IO usage (reported by
 iostat)



 So :
 * is there a way to reduce priority of snap rm, to avoid overloading
 of the cluster ?
 * is there a way to have a blocking snap rm which will wait until it's
 completed
 * is there a way to speedup snap rm ?


 Note that I have a too low PG number on my cluster (200 PG for 40 active
 OSD ; but I'm trying to progressivly migrate data to a newer pool). Can
 it be the source of the problem ?

 Thanks,

 Olivier

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd snap rm overload my cluster (during backups)

2013-04-22 Thread Sage Weil

What version of Ceph are you running?

sage

On Mon, 22 Apr 2013, Andrey Korolyov wrote:

 I had  observed that the slow requests up to 10-20 seconds on writes
 may be produced immediately after creation or deletion of a snapshot
 of relatively large image, despite that image may be entirely unused
 at the moment.
 
 On Sun, Apr 21, 2013 at 7:44 PM, Gregory Farnum g...@inktank.com wrote:
  Which version of Ceph are you running right now and seeing this with
  (Sam reworked it a bit for Cuttlefish and it was in some of the dev
  releases)? Snapshot deletes are a little more expensive than we'd
  like, but I'm surprised they're doing this badly for you. :/
  -Greg
  Software Engineer #42 @ http://inktank.com | http://ceph.com
 
  On Sun, Apr 21, 2013 at 2:16 AM, Olivier Bonvalet
  olivier.bonva...@daevel.fr wrote:
  Hi,
 
  I have a backup script, which every night :
  * create a snapshot of each RBD image
  * then delete all snapshot that have more than 15 days
 
  The problem is that rbd snap rm XXX will overload my cluster for hours
  (6 hours today...).
 
  Here I see several problems :
  #1 rbd snap rm XXX is not blocking. The erase is done in background,
  and I know no way to verify if it was completed. So I add sleeps
  between rm, but I have to estimate the time it will take
 
  #2 rbd (snap) rm are sometimes very very slow. I don't know if it's
  because of XFS or not, but all my OSD are at 100% IO usage (reported by
  iostat)
 
 
 
  So :
  * is there a way to reduce priority of snap rm, to avoid overloading
  of the cluster ?
  * is there a way to have a blocking snap rm which will wait until it's
  completed
  * is there a way to speedup snap rm ?
 
 
  Note that I have a too low PG number on my cluster (200 PG for 40 active
  OSD ; but I'm trying to progressivly migrate data to a newer pool). Can
  it be the source of the problem ?
 
  Thanks,
 
  Olivier
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd snap rm overload my cluster (during backups)

2013-04-22 Thread Andrey Korolyov

On Mon, Apr 22, 2013 at 7:10 PM, Sage Weil s...@inktank.com wrote:
 What version of Ceph are you running?

 sage

0.56.4 with couple of backports from bobtail, but I`m not sure if
version matters - the same behaviour was around from early 0.5x
releases.


 On Mon, 22 Apr 2013, Andrey Korolyov wrote:

 I had  observed that the slow requests up to 10-20 seconds on writes
 may be produced immediately after creation or deletion of a snapshot
 of relatively large image, despite that image may be entirely unused
 at the moment.

 On Sun, Apr 21, 2013 at 7:44 PM, Gregory Farnum g...@inktank.com wrote:
  Which version of Ceph are you running right now and seeing this with
  (Sam reworked it a bit for Cuttlefish and it was in some of the dev
  releases)? Snapshot deletes are a little more expensive than we'd
  like, but I'm surprised they're doing this badly for you. :/
  -Greg
  Software Engineer #42 @ http://inktank.com | http://ceph.com
 
  On Sun, Apr 21, 2013 at 2:16 AM, Olivier Bonvalet
  olivier.bonva...@daevel.fr wrote:
  Hi,
 
  I have a backup script, which every night :
  * create a snapshot of each RBD image
  * then delete all snapshot that have more than 15 days
 
  The problem is that rbd snap rm XXX will overload my cluster for hours
  (6 hours today...).
 
  Here I see several problems :
  #1 rbd snap rm XXX is not blocking. The erase is done in background,
  and I know no way to verify if it was completed. So I add sleeps
  between rm, but I have to estimate the time it will take
 
  #2 rbd (snap) rm are sometimes very very slow. I don't know if it's
  because of XFS or not, but all my OSD are at 100% IO usage (reported by
  iostat)
 
 
 
  So :
  * is there a way to reduce priority of snap rm, to avoid overloading
  of the cluster ?
  * is there a way to have a blocking snap rm which will wait until it's
  completed
  * is there a way to speedup snap rm ?
 
 
  Note that I have a too low PG number on my cluster (200 PG for 40 active
  OSD ; but I'm trying to progressivly migrate data to a newer pool). Can
  it be the source of the problem ?
 
  Thanks,
 
  Olivier
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd snap rm overload my cluster (during backups)

2013-04-22 Thread Sage Weil

On Mon, 22 Apr 2013, Andrey Korolyov wrote:
 On Mon, Apr 22, 2013 at 7:10 PM, Sage Weil s...@inktank.com wrote:
  What version of Ceph are you running?
 
  sage
 
 0.56.4 with couple of backports from bobtail, but I`m not sure if
 version matters - the same behaviour was around from early 0.5x
 releases.

I ask because the snapshot trimming was completely rewritten in 0.58 or 
0.59.  Is possible to test the latest on this cluster, or is it in 
production?

sage

 
 
  On Mon, 22 Apr 2013, Andrey Korolyov wrote:
 
  I had  observed that the slow requests up to 10-20 seconds on writes
  may be produced immediately after creation or deletion of a snapshot
  of relatively large image, despite that image may be entirely unused
  at the moment.
 
  On Sun, Apr 21, 2013 at 7:44 PM, Gregory Farnum g...@inktank.com wrote:
   Which version of Ceph are you running right now and seeing this with
   (Sam reworked it a bit for Cuttlefish and it was in some of the dev
   releases)? Snapshot deletes are a little more expensive than we'd
   like, but I'm surprised they're doing this badly for you. :/
   -Greg
   Software Engineer #42 @ http://inktank.com | http://ceph.com
  
   On Sun, Apr 21, 2013 at 2:16 AM, Olivier Bonvalet
   olivier.bonva...@daevel.fr wrote:
   Hi,
  
   I have a backup script, which every night :
   * create a snapshot of each RBD image
   * then delete all snapshot that have more than 15 days
  
   The problem is that rbd snap rm XXX will overload my cluster for hours
   (6 hours today...).
  
   Here I see several problems :
   #1 rbd snap rm XXX is not blocking. The erase is done in background,
   and I know no way to verify if it was completed. So I add sleeps
   between rm, but I have to estimate the time it will take
  
   #2 rbd (snap) rm are sometimes very very slow. I don't know if it's
   because of XFS or not, but all my OSD are at 100% IO usage (reported by
   iostat)
  
  
  
   So :
   * is there a way to reduce priority of snap rm, to avoid overloading
   of the cluster ?
   * is there a way to have a blocking snap rm which will wait until it's
   completed
   * is there a way to speedup snap rm ?
  
  
   Note that I have a too low PG number on my cluster (200 PG for 40 active
   OSD ; but I'm trying to progressivly migrate data to a newer pool). Can
   it be the source of the problem ?
  
   Thanks,
  
   Olivier
  
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd snap rm overload my cluster (during backups)

2013-04-22 Thread Andrey Korolyov

Mentioned cluster is in production, but I can compare number of slow
reqs on the test with different versions and report it.

On Mon, Apr 22, 2013 at 7:24 PM, Sage Weil s...@inktank.com wrote:
 On Mon, 22 Apr 2013, Andrey Korolyov wrote:
 On Mon, Apr 22, 2013 at 7:10 PM, Sage Weil s...@inktank.com wrote:
  What version of Ceph are you running?
 
  sage

 0.56.4 with couple of backports from bobtail, but I`m not sure if
 version matters - the same behaviour was around from early 0.5x
 releases.

 I ask because the snapshot trimming was completely rewritten in 0.58 or
 0.59.  Is possible to test the latest on this cluster, or is it in
 production?

 sage


 
  On Mon, 22 Apr 2013, Andrey Korolyov wrote:
 
  I had  observed that the slow requests up to 10-20 seconds on writes
  may be produced immediately after creation or deletion of a snapshot
  of relatively large image, despite that image may be entirely unused
  at the moment.
 
  On Sun, Apr 21, 2013 at 7:44 PM, Gregory Farnum g...@inktank.com wrote:
   Which version of Ceph are you running right now and seeing this with
   (Sam reworked it a bit for Cuttlefish and it was in some of the dev
   releases)? Snapshot deletes are a little more expensive than we'd
   like, but I'm surprised they're doing this badly for you. :/
   -Greg
   Software Engineer #42 @ http://inktank.com | http://ceph.com
  
   On Sun, Apr 21, 2013 at 2:16 AM, Olivier Bonvalet
   olivier.bonva...@daevel.fr wrote:
   Hi,
  
   I have a backup script, which every night :
   * create a snapshot of each RBD image
   * then delete all snapshot that have more than 15 days
  
   The problem is that rbd snap rm XXX will overload my cluster for 
   hours
   (6 hours today...).
  
   Here I see several problems :
   #1 rbd snap rm XXX is not blocking. The erase is done in background,
   and I know no way to verify if it was completed. So I add sleeps
   between rm, but I have to estimate the time it will take
  
   #2 rbd (snap) rm are sometimes very very slow. I don't know if it's
   because of XFS or not, but all my OSD are at 100% IO usage (reported by
   iostat)
  
  
  
   So :
   * is there a way to reduce priority of snap rm, to avoid overloading
   of the cluster ?
   * is there a way to have a blocking snap rm which will wait until 
   it's
   completed
   * is there a way to speedup snap rm ?
  
  
   Note that I have a too low PG number on my cluster (200 PG for 40 
   active
   OSD ; but I'm trying to progressivly migrate data to a newer pool). Can
   it be the source of the problem ?
  
   Thanks,
  
   Olivier
  
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd snap rm overload my cluster (during backups)

2013-04-21 Thread Gregory Farnum

Which version of Ceph are you running right now and seeing this with
(Sam reworked it a bit for Cuttlefish and it was in some of the dev
releases)? Snapshot deletes are a little more expensive than we'd
like, but I'm surprised they're doing this badly for you. :/
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

On Sun, Apr 21, 2013 at 2:16 AM, Olivier Bonvalet
olivier.bonva...@daevel.fr wrote:
 Hi,

 I have a backup script, which every night :
 * create a snapshot of each RBD image
 * then delete all snapshot that have more than 15 days

 The problem is that rbd snap rm XXX will overload my cluster for hours
 (6 hours today...).

 Here I see several problems :
 #1 rbd snap rm XXX is not blocking. The erase is done in background,
 and I know no way to verify if it was completed. So I add sleeps
 between rm, but I have to estimate the time it will take

 #2 rbd (snap) rm are sometimes very very slow. I don't know if it's
 because of XFS or not, but all my OSD are at 100% IO usage (reported by
 iostat)



 So :
 * is there a way to reduce priority of snap rm, to avoid overloading
 of the cluster ?
 * is there a way to have a blocking snap rm which will wait until it's
 completed
 * is there a way to speedup snap rm ?


 Note that I have a too low PG number on my cluster (200 PG for 40 active
 OSD ; but I'm trying to progressivly migrate data to a newer pool). Can
 it be the source of the problem ?

 Thanks,

 Olivier

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd snap rm overload my cluster (during backups)

2013-04-21 Thread Olivier Bonvalet

I use ceph 0.56.4 ; and to be fair, a lot of stuff are «doing badly» on
my cluster, so maybe I have a general OSD problem.


Le dimanche 21 avril 2013 à 08:44 -0700, Gregory Farnum a écrit :
 Which version of Ceph are you running right now and seeing this with
 (Sam reworked it a bit for Cuttlefish and it was in some of the dev
 releases)? Snapshot deletes are a little more expensive than we'd
 like, but I'm surprised they're doing this badly for you. :/
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com
 
 On Sun, Apr 21, 2013 at 2:16 AM, Olivier Bonvalet
 olivier.bonva...@daevel.fr wrote:
  Hi,
 
  I have a backup script, which every night :
  * create a snapshot of each RBD image
  * then delete all snapshot that have more than 15 days
 
  The problem is that rbd snap rm XXX will overload my cluster for hours
  (6 hours today...).
 
  Here I see several problems :
  #1 rbd snap rm XXX is not blocking. The erase is done in background,
  and I know no way to verify if it was completed. So I add sleeps
  between rm, but I have to estimate the time it will take
 
  #2 rbd (snap) rm are sometimes very very slow. I don't know if it's
  because of XFS or not, but all my OSD are at 100% IO usage (reported by
  iostat)
 
 
 
  So :
  * is there a way to reduce priority of snap rm, to avoid overloading
  of the cluster ?
  * is there a way to have a blocking snap rm which will wait until it's
  completed
  * is there a way to speedup snap rm ?
 
 
  Note that I have a too low PG number on my cluster (200 PG for 40 active
  OSD ; but I'm trying to progressivly migrate data to a newer pool). Can
  it be the source of the problem ?
 
  Thanks,
 
  Olivier
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd snap rm overload my cluster (during backups)

Re: [ceph-users] rbd snap rm overload my cluster (during backups)

Re: [ceph-users] rbd snap rm overload my cluster (during backups)

Re: [ceph-users] rbd snap rm overload my cluster (during backups)

Re: [ceph-users] rbd snap rm overload my cluster (during backups)

Re: [ceph-users] rbd snap rm overload my cluster (during backups)

Re: [ceph-users] rbd snap rm overload my cluster (during backups)

Re: [ceph-users] rbd snap rm overload my cluster (during backups)

8 matches

Site Navigation

Mail list logo

Footer information