Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-17 Thread Craig Lewis
I'd like to see some way to cap recovery IOPS per OSD.  Don't allow
backfill to do no more than 50 operations per second.  It will slow
backfill down, but reserve plenty of IOPS for normal operation.  I know
that implementing this well is not a simple task.


I know I did some stupid things that caused a lot of my problems.  Most of
my problems can be traced back to
  osd mkfs options xfs = -l size=1024m -n size=64k -i size=2048 -s size=4096
 and the kernel malloc problems it caused.

Reformatting all of the disks fixed a lot of my issues, but it didn't fix
them all.




While I was reformatting my secondary cluster, I tested the stability by
reformatting all of the disks on the last node at once.  I didn't mark them
out and wait for the rebuild; I removed the OSDs, reformatted, and added
them back to the cluster.  It was 10 disks out of 36 total, in a 4 node
cluster (I'm waiting for hardware to free up to build the 5th node).
 Everything was fine for the first hour or so.  After several hours, there
was enough latency that the HTTP load balancer was marking RadosGW nodes
down.  My load balancer has a 30s timeout.  Since the latency was cluster
wide, all RadosGW nodes were marked down together.  When the latency spike
subsided, they'd all get marked up again.  This continued until the
backfill completed.  They were mostly up.  I don't have numbers, but I
think they were marked down about 5 times an hour, for less than a minute
each time.  That really messes with radosgw-agent.


I had recovery tuned down:
  osd max backfills = 1
  osd recovery max active = 1
  osd recovery op priority = 1

I have journals on SSD, and single GigE public and cluster networks.  This
cluster has 2x replication (I'm waiting for the 5th node to go to 3x).  The
cluster network was pushing 950 Mbps.  The SSDs and OSDs had plenty of
write bandwidth, but the HDDs were saturating their IOPs.  These are
consumer class 7200 RPM SATA disks, so they don't have very many IOPS.

The average write latency on these OSDs is normally ~10ms.  While this
backfill was going on, the average write latency was 100ms, with plenty of
times when the latency was 200ms.  The average read latency increased, but
not as bad.  It averaged 50ms, with occasional spikes up to 400ms.  Since I
formatted a 27% of my cluster, I was seeing higher latency on 55% of my
OSDs (readers and writers).

Instead, if I trickle in the disks, everything works fine.  I was able to
reformat 2 OSDs at a time without a problem.  The cluster latency increase
was barely noticeable, even though the IOPS on those two disks were
saturated.  A bit of latency here and there (5% of the time) doesn't hurt
much.  When it's 55% of the time, it hurts a lot more.


When I finally get the 5th node, and increase replication from 2x to 3x, I
expect this cluster to be unusable for about a week.







On Thu, Jul 17, 2014 at 9:02 AM, Andrei Mikhailovsky and...@arhont.com
wrote:

 Comments inline


 --
 *From: *Sage Weil sw...@redhat.com
 *To: *Quenten Grasso qgra...@onq.com.au
 *Cc: *ceph-users@lists.ceph.com
 *Sent: *Thursday, 17 July, 2014 4:44:45 PM

 *Subject: *Re: [ceph-users] ceph osd crush tunables optimal AND add new
 OSD at the same time

 On Thu, 17 Jul 2014, Quenten Grasso wrote:

  Hi Sage  List
 
  I understand this is probably a hard question to answer.
 
  I mentioned previously our cluster is co-located MON?s on OSD servers,
 which
  are R515?s w/ 1 x AMD 6 Core processor  11 3TB OSD?s w/ dual 10GBE.
 
  When our cluster is doing these busy operations and IO has stopped as in
 my
  case, I mentioned earlier running/setting tuneable to optimal or heavy
  recovery
 
  operations is there a way to ensure our IO doesn?t get completely
  blocked/stopped/frozen in our vms?
 
  Could it be as simple as putting all 3 of our mon servers on baremetal
   w/ssd?s? (I recall reading somewhere that a mon disk was doing several
  thousand IOPS during a recovery operation)
 
  I assume putting just one on baremetal won?t help because our mon?s will
 only
  ever be as fast as our slowest mon server?

 I don't think this is related to where the mons are (most likely).  The
 big question for me is whether IO is getting completely blocked, or just
 slowed enough that the VMs are all timing out.


 AM: I was looking at the cluster status while the rebalancing was taking
 place and I was seeing very little client IO reported by ceph -s output.
 The numbers were around 20-100 whereas our typical IO for the cluster is
 around 1000. Having said that, this was not enough as _all_ of our vms
 become unresponsive and didn't recover after rebalancing finished.


 What slow request messages
 did you see during the rebalance?

 AM: As I was experimenting with different options while trying to gain
 some client IO back i've noticed that when I am limiting the options to 1
 per osd ( osd max backfills = 1, osd recovery max active = 1, osd
 recovery threads = 1), I did not have any slow

Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-16 Thread Quenten Grasso
Hi Sage, Andrija  List

I have seen the tuneables issue on our cluster when I upgraded to firefly.

I ended up going back to legacy settings after about an hour as my cluster is 
of 55 3TB OSD’s over 5 nodes and it decided it needed to move around 32% of our 
data, which after an hour all of our vm’s were frozen and I had to revert the 
change back to legacy settings and wait about the same time again until our 
cluster had recovered and reboot our vms. (wasn’t really expecting that one 
from the patch notes)

Also our CPU usage went through the roof as well on our nodes, do you per 
chance have your metadata servers co-located on your osd nodes as we do?  I’ve 
been thinking about trying to move these to dedicated nodes as it may resolve 
our issues.

Regards,
Quenten

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Andrija Panic
Sent: Tuesday, 15 July 2014 8:38 PM
To: Sage Weil
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at 
the same time

Hi Sage,

since this problem is tunables-related, do we need to expect same behavior or 
not  when we do regular data rebalancing caused by adding new/removing OSD? I 
guess not, but would like your confirmation.
I'm already on optimal tunables, but I'm afraid to test this by i.e. shuting 
down 1 OSD.

Thanks,
Andrija

On 14 July 2014 18:18, Sage Weil sw...@redhat.commailto:sw...@redhat.com 
wrote:
I've added some additional notes/warnings to the upgrade and release
notes:

 https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451

If there is somewhere else where you think a warning flag would be useful,
let me know!

Generally speaking, we want to be able to cope with huge data rebalances
without interrupting service.  It's an ongoing process of improving the
recovery vs client prioritization, though, and removing sources of
overhead related to rebalancing... and it's clearly not perfect yet. :/

sage


On Sun, 13 Jul 2014, Andrija Panic wrote:

 Hi,
 after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd crush
 tunables optimal and after only few minutes I have added 2 more OSDs to the
 CEPH cluster...

 So these 2 changes were more or a less done at the same time - rebalancing
 because of tunables optimal, and rebalancing because of adding new OSD...

 Result - all VMs living on CEPH storage have gone mad, no disk access
 efectively, blocked so to speak.

 Since this rebalancing took 5h-6h, I had bunch of VMs down for that long...

 Did I do wrong by causing 2 rebalancing to happen at the same time ?
 Is this behaviour normal, to cause great load on all VMs because they are
 unable to access CEPH storage efectively ?

 Thanks for any input...
 --

 Andrija Pani?





--

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-16 Thread Andrei Mikhailovsky
Quenten, 

We've got two monitors sitting on the osd servers and one on a different 
server. 

Andrei 

-- 
Andrei Mikhailovsky 
Director 
Arhont Information Security 

Web: http://www.arhont.com 
http://www.wi-foo.com 
Tel: +44 (0)870 4431337 
Fax: +44 (0)208 429 3111 
PGP: Key ID - 0x2B3438DE 
PGP: Server - keyserver.pgp.com 

DISCLAIMER 

The information contained in this email is intended only for the use of the 
person(s) to whom it is addressed and may be confidential or contain legally 
privileged information. If you are not the intended recipient you are hereby 
notified that any perusal, use, distribution, copying or disclosure is strictly 
prohibited. If you have received this email in error please immediately advise 
us by return email at and...@arhont.com and delete and purge the email and any 
attachments without making a copy. 


- Original Message -

From: Quenten Grasso qgra...@onq.com.au 
To: Andrija Panic andrija.pa...@gmail.com, Sage Weil sw...@redhat.com 
Cc: ceph-users@lists.ceph.com 
Sent: Wednesday, 16 July, 2014 1:20:19 PM 
Subject: Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at 
the same time 



Hi Sage, Andrija  List 



I have seen the tuneables issue on our cluster when I upgraded to firefly. 



I ended up going back to legacy settings after about an hour as my cluster is 
of 55 3TB OSD’s over 5 nodes and it decided it needed to move around 32% of our 
data, which after an hour all of our vm’s were frozen and I had to revert the 
change back to legacy settings and wait about the same time again until our 
cluster had recovered and reboot our vms. (wasn’t really expecting that one 
from the patch notes) 



Also our CPU usage went through the roof as well on our nodes, do you per 
chance have your metadata servers co-located on your osd nodes as we do? I’ve 
been thinking about trying to move these to dedicated nodes as it may resolve 
our issues. 



Regards, 

Quenten 



From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Andrija Panic 
Sent: Tuesday, 15 July 2014 8:38 PM 
To: Sage Weil 
Cc: ceph-users@lists.ceph.com 
Subject: Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at 
the same time 




Hi Sage, 





since this problem is tunables-related, do we need to expect same behavior or 
not when we do regular data rebalancing caused by adding new/removing OSD? I 
guess not, but would like your confirmation. 


I'm already on optimal tunables, but I'm afraid to test this by i.e. shuting 
down 1 OSD. 





Thanks, 
Andrija 





On 14 July 2014 18:18, Sage Weil  sw...@redhat.com  wrote: 



I've added some additional notes/warnings to the upgrade and release 
notes: 

https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451 

If there is somewhere else where you think a warning flag would be useful, 
let me know! 

Generally speaking, we want to be able to cope with huge data rebalances 
without interrupting service. It's an ongoing process of improving the 
recovery vs client prioritization, though, and removing sources of 
overhead related to rebalancing... and it's clearly not perfect yet. :/ 

sage 




On Sun, 13 Jul 2014, Andrija Panic wrote: 

 Hi, 
 after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd crush 
 tunables optimal and after only few minutes I have added 2 more OSDs to the 
 CEPH cluster... 
 
 So these 2 changes were more or a less done at the same time - rebalancing 
 because of tunables optimal, and rebalancing because of adding new OSD... 
 
 Result - all VMs living on CEPH storage have gone mad, no disk access 
 efectively, blocked so to speak. 
 
 Since this rebalancing took 5h-6h, I had bunch of VMs down for that long... 
 
 Did I do wrong by causing 2 rebalancing to happen at the same time ? 
 Is this behaviour normal, to cause great load on all VMs because they are 
 unable to access CEPH storage efectively ? 
 
 Thanks for any input... 
 -- 
 


 Andrija Pani? 
 
 











-- 





Andrija Panić 


-- 


http://admintweets.com 


-- 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-16 Thread Andrija Panic
For me, 3 nodes, 1MON+ 2x2TB OSDs on each node... no mds used...
I went through pain of waiting for data rebalancing and now I'm on
optimal tunables...
Cheers


On 16 July 2014 14:29, Andrei Mikhailovsky and...@arhont.com wrote:

 Quenten,

 We've got two monitors sitting on the osd servers and one on a different
 server.

 Andrei

 --
 Andrei Mikhailovsky
 Director
 Arhont Information Security

 Web: http://www.arhont.com
 http://www.wi-foo.com
 Tel: +44 (0)870 4431337
 Fax: +44 (0)208 429 3111
 PGP: Key ID - 0x2B3438DE
 PGP: Server - keyserver.pgp.com

 DISCLAIMER

 The information contained in this email is intended only for the use of
 the person(s) to whom it is addressed and may be confidential or contain
 legally privileged information. If you are not the intended recipient you
 are hereby notified that any perusal, use, distribution, copying or
 disclosure is strictly prohibited. If you have received this email in error
 please immediately advise us by return email at and...@arhont.com and
 delete and purge the email and any attachments without making a copy.


 --
 *From: *Quenten Grasso qgra...@onq.com.au
 *To: *Andrija Panic andrija.pa...@gmail.com, Sage Weil 
 sw...@redhat.com
 *Cc: *ceph-users@lists.ceph.com
 *Sent: *Wednesday, 16 July, 2014 1:20:19 PM

 *Subject: *Re: [ceph-users] ceph osd crush tunables optimal AND add new
 OSD at the same time

 Hi Sage, Andrija  List



 I have seen the tuneables issue on our cluster when I upgraded to firefly.



 I ended up going back to legacy settings after about an hour as my cluster
 is of 55 3TB OSD’s over 5 nodes and it decided it needed to move around 32%
 of our data, which after an hour all of our vm’s were frozen and I had to
 revert the change back to legacy settings and wait about the same time
 again until our cluster had recovered and reboot our vms. (wasn’t really
 expecting that one from the patch notes)



 Also our CPU usage went through the roof as well on our nodes, do you per
 chance have your metadata servers co-located on your osd nodes as we do?
  I’ve been thinking about trying to move these to dedicated nodes as it may
 resolve our issues.



 Regards,

 Quenten



 *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
 Of *Andrija Panic
 *Sent:* Tuesday, 15 July 2014 8:38 PM
 *To:* Sage Weil
 *Cc:* ceph-users@lists.ceph.com
 *Subject:* Re: [ceph-users] ceph osd crush tunables optimal AND add new
 OSD at the same time



 Hi Sage,



 since this problem is tunables-related, do we need to expect same behavior
 or not  when we do regular data rebalancing caused by adding new/removing
 OSD? I guess not, but would like your confirmation.

 I'm already on optimal tunables, but I'm afraid to test this by i.e.
 shuting down 1 OSD.



 Thanks,
 Andrija



 On 14 July 2014 18:18, Sage Weil sw...@redhat.com wrote:

 I've added some additional notes/warnings to the upgrade and release
 notes:


 https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451

 If there is somewhere else where you think a warning flag would be useful,
 let me know!

 Generally speaking, we want to be able to cope with huge data rebalances
 without interrupting service.  It's an ongoing process of improving the
 recovery vs client prioritization, though, and removing sources of
 overhead related to rebalancing... and it's clearly not perfect yet. :/

 sage



 On Sun, 13 Jul 2014, Andrija Panic wrote:

  Hi,
  after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd
 crush
  tunables optimal and after only few minutes I have added 2 more OSDs to
 the
  CEPH cluster...
 
  So these 2 changes were more or a less done at the same time -
 rebalancing
  because of tunables optimal, and rebalancing because of adding new OSD...
 
  Result - all VMs living on CEPH storage have gone mad, no disk access
  efectively, blocked so to speak.
 
  Since this rebalancing took 5h-6h, I had bunch of VMs down for that
 long...
 
  Did I do wrong by causing 2 rebalancing to happen at the same time ?
  Is this behaviour normal, to cause great load on all VMs because they are
  unable to access CEPH storage efectively ?
 
  Thanks for any input...
  --
 

  Andrija Pani?
 
 





 --



 Andrija Panić

 --

   http://admintweets.com

 --

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-16 Thread Danny Luhde-Thompson
With 34 x 4TB OSDs over 4 hosts, I had 30% objects moved - about half full
and took around 12 hours.  Except now I can't use the kclient any more -
wish I'd read that first.


On 16 July 2014 13:36, Andrija Panic andrija.pa...@gmail.com wrote:

 For me, 3 nodes, 1MON+ 2x2TB OSDs on each node... no mds used...
 I went through pain of waiting for data rebalancing and now I'm on
 optimal tunables...
 Cheers


 On 16 July 2014 14:29, Andrei Mikhailovsky and...@arhont.com wrote:

 Quenten,

 We've got two monitors sitting on the osd servers and one on a different
 server.

 Andrei

 --
 Andrei Mikhailovsky
 Director
 Arhont Information Security

 Web: http://www.arhont.com
 http://www.wi-foo.com
 Tel: +44 (0)870 4431337
 Fax: +44 (0)208 429 3111
 PGP: Key ID - 0x2B3438DE
 PGP: Server - keyserver.pgp.com

 DISCLAIMER

 The information contained in this email is intended only for the use of
 the person(s) to whom it is addressed and may be confidential or contain
 legally privileged information. If you are not the intended recipient you
 are hereby notified that any perusal, use, distribution, copying or
 disclosure is strictly prohibited. If you have received this email in error
 please immediately advise us by return email at and...@arhont.com and
 delete and purge the email and any attachments without making a copy.


 --
 *From: *Quenten Grasso qgra...@onq.com.au
 *To: *Andrija Panic andrija.pa...@gmail.com, Sage Weil 
 sw...@redhat.com
 *Cc: *ceph-users@lists.ceph.com
 *Sent: *Wednesday, 16 July, 2014 1:20:19 PM

 *Subject: *Re: [ceph-users] ceph osd crush tunables optimal AND add new
 OSD at the same time

 Hi Sage, Andrija  List



 I have seen the tuneables issue on our cluster when I upgraded to firefly.



 I ended up going back to legacy settings after about an hour as my
 cluster is of 55 3TB OSD’s over 5 nodes and it decided it needed to move
 around 32% of our data, which after an hour all of our vm’s were frozen and
 I had to revert the change back to legacy settings and wait about the same
 time again until our cluster had recovered and reboot our vms. (wasn’t
 really expecting that one from the patch notes)



 Also our CPU usage went through the roof as well on our nodes, do you per
 chance have your metadata servers co-located on your osd nodes as we do?
  I’ve been thinking about trying to move these to dedicated nodes as it may
 resolve our issues.



 Regards,

 Quenten



 *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
 Of *Andrija Panic
 *Sent:* Tuesday, 15 July 2014 8:38 PM
 *To:* Sage Weil
 *Cc:* ceph-users@lists.ceph.com
 *Subject:* Re: [ceph-users] ceph osd crush tunables optimal AND add new
 OSD at the same time



 Hi Sage,



 since this problem is tunables-related, do we need to expect same
 behavior or not  when we do regular data rebalancing caused by adding
 new/removing OSD? I guess not, but would like your confirmation.

 I'm already on optimal tunables, but I'm afraid to test this by i.e.
 shuting down 1 OSD.



 Thanks,
 Andrija



 On 14 July 2014 18:18, Sage Weil sw...@redhat.com wrote:

 I've added some additional notes/warnings to the upgrade and release
 notes:


 https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451

 If there is somewhere else where you think a warning flag would be useful,
 let me know!

 Generally speaking, we want to be able to cope with huge data rebalances
 without interrupting service.  It's an ongoing process of improving the
 recovery vs client prioritization, though, and removing sources of
 overhead related to rebalancing... and it's clearly not perfect yet. :/

 sage



 On Sun, 13 Jul 2014, Andrija Panic wrote:

  Hi,
  after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd
 crush
  tunables optimal and after only few minutes I have added 2 more OSDs
 to the
  CEPH cluster...
 
  So these 2 changes were more or a less done at the same time -
 rebalancing
  because of tunables optimal, and rebalancing because of adding new
 OSD...
 
  Result - all VMs living on CEPH storage have gone mad, no disk access
  efectively, blocked so to speak.
 
  Since this rebalancing took 5h-6h, I had bunch of VMs down for that
 long...
 
  Did I do wrong by causing 2 rebalancing to happen at the same time ?
  Is this behaviour normal, to cause great load on all VMs because they
 are
  unable to access CEPH storage efectively ?
 
  Thanks for any input...
  --
 

  Andrija Pani?
 
 





 --



 Andrija Panić

 --

   http://admintweets.com

 --

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --

 Andrija Panić
 --
   http://admintweets.com
 --

 ___
 ceph-users mailing

Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-15 Thread Andrija Panic
Hi Sage,

since this problem is tunables-related, do we need to expect same behavior
or not  when we do regular data rebalancing caused by adding new/removing
OSD? I guess not, but would like your confirmation.
I'm already on optimal tunables, but I'm afraid to test this by i.e.
shuting down 1 OSD.

Thanks,
Andrija


On 14 July 2014 18:18, Sage Weil sw...@redhat.com wrote:

 I've added some additional notes/warnings to the upgrade and release
 notes:


 https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451

 If there is somewhere else where you think a warning flag would be useful,
 let me know!

 Generally speaking, we want to be able to cope with huge data rebalances
 without interrupting service.  It's an ongoing process of improving the
 recovery vs client prioritization, though, and removing sources of
 overhead related to rebalancing... and it's clearly not perfect yet. :/

 sage


 On Sun, 13 Jul 2014, Andrija Panic wrote:

  Hi,
  after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd
 crush
  tunables optimal and after only few minutes I have added 2 more OSDs to
 the
  CEPH cluster...
 
  So these 2 changes were more or a less done at the same time -
 rebalancing
  because of tunables optimal, and rebalancing because of adding new OSD...
 
  Result - all VMs living on CEPH storage have gone mad, no disk access
  efectively, blocked so to speak.
 
  Since this rebalancing took 5h-6h, I had bunch of VMs down for that
 long...
 
  Did I do wrong by causing 2 rebalancing to happen at the same time ?
  Is this behaviour normal, to cause great load on all VMs because they are
  unable to access CEPH storage efectively ?
 
  Thanks for any input...
  --
 
  Andrija Pani?
 
 




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-15 Thread Sage Weil
On Tue, 15 Jul 2014, Andrija Panic wrote:
 Hi Sage, since this problem is tunables-related, do we need to expect 
 same behavior or not  when we do regular data rebalancing caused by 
 adding new/removing OSD? I guess not, but would like your confirmation. 
 I'm already on optimal tunables, but I'm afraid to test this by i.e. 
 shuting down 1 OSD.

When you shut down a single OSD it is a relativey small amount of data 
that needs to move to do the recovery.  The issue with the tunables is 
just that a huge fraction of the data stored needs to move, and the 
performance impact is much higher.

sage

 
 Thanks,
 Andrija
 
 
 On 14 July 2014 18:18, Sage Weil sw...@redhat.com wrote:
   I've added some additional notes/warnings to the upgrade and
   release
   notes:
 
  https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca77328324
   51
 
   If there is somewhere else where you think a warning flag would
   be useful,
   let me know!
 
   Generally speaking, we want to be able to cope with huge data
   rebalances
   without interrupting service.  It's an ongoing process of
   improving the
   recovery vs client prioritization, though, and removing sources
   of
   overhead related to rebalancing... and it's clearly not perfect
   yet. :/
 
   sage
 
 
   On Sun, 13 Jul 2014, Andrija Panic wrote:
 
Hi,
after seting ceph upgrade (0.72.2 to 0.80.3) I have issued
   ceph osd crush
tunables optimal and after only few minutes I have added 2
   more OSDs to the
CEPH cluster...
   
So these 2 changes were more or a less done at the same time -
   rebalancing
because of tunables optimal, and rebalancing because of adding
   new OSD...
   
Result - all VMs living on CEPH storage have gone mad, no disk
   access
efectively, blocked so to speak.
   
Since this rebalancing took 5h-6h, I had bunch of VMs down for
   that long...
   
Did I do wrong by causing 2 rebalancing to happen at the
   same time ?
Is this behaviour normal, to cause great load on all VMs
   because they are
unable to access CEPH storage efectively ?
   
Thanks for any input...
-- 
   
  Andrija Pani?
 
 
 
 
 
 
 --
 
 Andrija Pani?
 --
   http://admintweets.com
 --
 
 ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-14 Thread Andrei Mikhailovsky
Hi Andrija, 

I've got at least two more stories of similar nature. One is my friend running 
a ceph cluster and one is from me. Both of our clusters are pretty small. My 
cluster has only two osd servers with 8 osds each, 3 mons. I have an ssd 
journal per 4 osds. My friend has a cluster of 3 mons and 3 osd servers with 4 
osds each and an ssd per 4 osds as well. Both clusters are connected with 
40gbit/s IP over Infiniband links. 

We had the same issue while upgrading to firefly. However, we did not add any 
new disks, just ran the ceph osd crush tunables optimal command after 
following an upgrade. 

Both of our clusters were down as far as the virtual machines are concerned. 
All vms have crashed because of the lack of IO. It was a bit problematic, 
taking into account that ceph is typically so great at staying alive during 
failures and upgrades. So, there seems to be a problem with the upgrade. I wish 
devs would have added a big note in red letters that if you run this command it 
will likely affect your cluster performance and most likely all your vms will 
die. So, please shutdown your vms if you do not want to have data loss. 

I've changed the default values to reduce the load during recovery and also to 
tune a few things performance wise. My settings were: 



osd recovery max chunk = 8388608 

osd recovery op priority = 2 

osd max backfills = 1 

osd recovery max active = 1 

osd recovery threads = 1 

osd disk threads = 2 

filestore max sync interval = 10 

filestore op threads = 20 

filestore_flusher = false 

However, this didn't help much and i've noticed that shortly after running the 
tunnables command my guest vms iowait has quickly jumped to 50% and a to 99% a 
minute after. This has happened on all vms at once. During the recovery phase I 
ran the rbd -p poolname ls -l command several times and it took between 
20-40 minutes to complete. It typically takes less than 2 seconds when the 
cluster is not in recovery mode. 

My mate's cluster had the same tunables apart from the last three. He had 
exactly the same behaviour. 

One other thing that i've noticed is that somewhere in the docs I've read that 
running the tunnable optimal command should move not more than 10% of your 
data. However, in both of our cases our status was just over 30% degraded and 
it took a good part of 9 hours to complete the data reshuffling. 


Any comments from the ceph team or other ceph gurus on: 

1. What have we done wrong in our upgrade process 
2. What options should we have used to keep our vms alive 


Cheers 

Andrei 




- Original Message -

From: Andrija Panic andrija.pa...@gmail.com 
To: ceph-users@lists.ceph.com 
Sent: Sunday, 13 July, 2014 9:54:17 PM 
Subject: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the 
same time 

Hi, 

after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd crush 
tunables optimal and after only few minutes I have added 2 more OSDs to the 
CEPH cluster... 

So these 2 changes were more or a less done at the same time - rebalancing 
because of tunables optimal, and rebalancing because of adding new OSD... 

Result - all VMs living on CEPH storage have gone mad, no disk access 
efectively, blocked so to speak. 

Since this rebalancing took 5h-6h, I had bunch of VMs down for that long... 

Did I do wrong by causing 2 rebalancing to happen at the same time ? 
Is this behaviour normal, to cause great load on all VMs because they are 
unable to access CEPH storage efectively ? 

Thanks for any input... 
-- 

Andrija Panić 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-14 Thread Andrija Panic
Hi Andrei, nice to meet you again ;)

Thanks for sharing this info with me - I though it was my mistake by
introducing new OSD components at the same time - I though that since it's
rebalancing, let's add those new OSD, so it also rebalances - so I don't
have to cause 2 data rebalancing  - but during normal OSD restart and data
rebalancing (I did not set osd noout etc...) I did have somehat lower VM
performacne, but it was all UP and fine.

Also 30% of data moving during my upgrade/tunables change... although
documents say 10% as you said.

Did not lost any data, but finding all VMs that use CEPH as storage, is
somewhat PITA...

So, any CEPH developers input would be greatly appriciated...

Thanks agan for such detailed info,
Andrija





On 14 July 2014 10:52, Andrei Mikhailovsky and...@arhont.com wrote:

 Hi Andrija,

 I've got at least two more stories of similar nature. One is my friend
 running a ceph cluster and one is from me. Both of our clusters are pretty
 small. My cluster has only two osd servers with 8 osds each, 3 mons. I have
 an ssd journal per 4 osds. My friend has a cluster of 3 mons and 3 osd
 servers with 4 osds each and an ssd per 4 osds as well. Both clusters are
 connected with 40gbit/s IP over Infiniband links.

 We had the same issue while upgrading to firefly. However, we did not add
 any new disks, just ran the ceph osd crush tunables optimal command after
 following an upgrade.

 Both of our clusters were down as far as the virtual machines are
 concerned. All vms have crashed because of the lack of IO. It was a bit
 problematic, taking into account that ceph is typically so great at staying
 alive during failures and upgrades. So, there seems to be a problem with
 the upgrade. I wish devs would have added a big note in red letters that if
 you run this command it will likely affect your cluster performance and
 most likely all your vms will die. So, please shutdown your vms if you do
 not want to have data loss.

 I've changed the default values to reduce the load during recovery and
 also to tune a few things performance wise. My settings were:

 osd recovery max chunk = 8388608

 osd recovery op priority = 2

 osd max backfills = 1

 osd recovery max active = 1

 osd recovery threads = 1

 osd disk threads = 2

 filestore max sync interval = 10

 filestore op threads = 20

 filestore_flusher = false

 However, this didn't help much and i've noticed that shortly after running
 the tunnables command my guest vms iowait has quickly jumped to 50% and a
 to 99% a minute after. This has happened on all vms at once. During the
 recovery phase I ran the rbd -p poolname ls -l command several times
 and it took between 20-40 minutes to complete. It typically takes less than
 2 seconds when the cluster is not in recovery mode.

 My mate's cluster had the same tunables apart from the last three. He had
 exactly the same behaviour.

 One other thing that i've noticed is that somewhere in the docs I've read
 that running the tunnable optimal command should move not more than 10% of
 your data. However, in both of our cases our status was just over 30%
 degraded and it took a good part of 9 hours to complete the data
 reshuffling.


 Any comments from the ceph team or other ceph gurus on:

 1. What have we done wrong in our upgrade  process
 2. What options should we have used to keep our vms alive


 Cheers

 Andrei




 --
 *From: *Andrija Panic andrija.pa...@gmail.com
 *To: *ceph-users@lists.ceph.com
 *Sent: *Sunday, 13 July, 2014 9:54:17 PM
 *Subject: *[ceph-users] ceph osd crush tunables optimal AND add new OSD
 at thesame time


 Hi,

 after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd crush
 tunables optimal and after only few minutes I have added 2 more OSDs to
 the CEPH cluster...

 So these 2 changes were more or a less done at the same time - rebalancing
 because of tunables optimal, and rebalancing because of adding new OSD...

 Result - all VMs living on CEPH storage have gone mad, no disk access
 efectively, blocked so to speak.

 Since this rebalancing took 5h-6h, I had bunch of VMs down for that long...

 Did I do wrong by causing 2 rebalancing to happen at the same time ?
 Is this behaviour normal, to cause great load on all VMs because they are
 unable to access CEPH storage efectively ?

 Thanks for any input...
 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-14 Thread Sage Weil
I've added some additional notes/warnings to the upgrade and release 
notes:

 https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451

If there is somewhere else where you think a warning flag would be useful, 
let me know!

Generally speaking, we want to be able to cope with huge data rebalances 
without interrupting service.  It's an ongoing process of improving the 
recovery vs client prioritization, though, and removing sources of 
overhead related to rebalancing... and it's clearly not perfect yet. :/

sage


On Sun, 13 Jul 2014, Andrija Panic wrote:

 Hi,
 after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd crush
 tunables optimal and after only few minutes I have added 2 more OSDs to the
 CEPH cluster...
 
 So these 2 changes were more or a less done at the same time - rebalancing
 because of tunables optimal, and rebalancing because of adding new OSD...
 
 Result - all VMs living on CEPH storage have gone mad, no disk access
 efectively, blocked so to speak.
 
 Since this rebalancing took 5h-6h, I had bunch of VMs down for that long...
 
 Did I do wrong by causing 2 rebalancing to happen at the same time ?
 Is this behaviour normal, to cause great load on all VMs because they are
 unable to access CEPH storage efectively ?
 
 Thanks for any input...
 -- 
 
 Andrija Pani?
 
 ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-14 Thread Andrija Panic
Perhaps here: http://ceph.com/releases/v0-80-firefly-released/
Thanks


On 14 July 2014 18:18, Sage Weil sw...@redhat.com wrote:

 I've added some additional notes/warnings to the upgrade and release
 notes:


 https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451

 If there is somewhere else where you think a warning flag would be useful,
 let me know!

 Generally speaking, we want to be able to cope with huge data rebalances
 without interrupting service.  It's an ongoing process of improving the
 recovery vs client prioritization, though, and removing sources of
 overhead related to rebalancing... and it's clearly not perfect yet. :/

 sage


 On Sun, 13 Jul 2014, Andrija Panic wrote:

  Hi,
  after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd
 crush
  tunables optimal and after only few minutes I have added 2 more OSDs to
 the
  CEPH cluster...
 
  So these 2 changes were more or a less done at the same time -
 rebalancing
  because of tunables optimal, and rebalancing because of adding new OSD...
 
  Result - all VMs living on CEPH storage have gone mad, no disk access
  efectively, blocked so to speak.
 
  Since this rebalancing took 5h-6h, I had bunch of VMs down for that
 long...
 
  Did I do wrong by causing 2 rebalancing to happen at the same time ?
  Is this behaviour normal, to cause great load on all VMs because they are
  unable to access CEPH storage efectively ?
 
  Thanks for any input...
  --
 
  Andrija Pani?
 
 




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-14 Thread Udo Lembke
Hi,
which values are all changed with ceph osd crush tunables optimal?

Is it perhaps possible to change some parameter the weekends before the
upgrade is running, to have more time?
(depends if the parameter are available in 0.72...).

The warning told, it's can take days... we have an cluster with 5
storage node and 12 4TB-osd-disk each (60 osd), replica 2. The cluster
is 60% filled.
Networkconnection 10Gb.
Takes tunables optimal in such an configuration one, two or more days?

Udo

On 14.07.2014 18:18, Sage Weil wrote:
 I've added some additional notes/warnings to the upgrade and release 
 notes:

  https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451

 If there is somewhere else where you think a warning flag would be useful, 
 let me know!

 Generally speaking, we want to be able to cope with huge data rebalances 
 without interrupting service.  It's an ongoing process of improving the 
 recovery vs client prioritization, though, and removing sources of 
 overhead related to rebalancing... and it's clearly not perfect yet. :/

 sage




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-14 Thread Sage Weil
On Mon, 14 Jul 2014, Udo Lembke wrote:
 Hi,
 which values are all changed with ceph osd crush tunables optimal?

There are some brand new crush tunables that fix.. I don't even remember 
off hand.

In general, you probably want to stay away from 'optimal' unless this is a 
fresh cluster and all clients are librados.  Using the 'firefly' tunables 
is probably the safest bet.

Keep in mind that adjusting tunables is going to move a bunch of data and 
client performance will be heavily impacted.  If that's ok, go for it, 
otherise just stick with bobtail tunables unless/until it becomes a 
problem.

sage

 
 Is it perhaps possible to change some parameter the weekends before the
 upgrade is running, to have more time?
 (depends if the parameter are available in 0.72...).
 
 The warning told, it's can take days... we have an cluster with 5
 storage node and 12 4TB-osd-disk each (60 osd), replica 2. The cluster
 is 60% filled.
 Networkconnection 10Gb.
 Takes tunables optimal in such an configuration one, two or more days?
 
 Udo
 
 On 14.07.2014 18:18, Sage Weil wrote:
  I've added some additional notes/warnings to the upgrade and release 
  notes:
 
   
  https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451
 
  If there is somewhere else where you think a warning flag would be useful, 
  let me know!
 
  Generally speaking, we want to be able to cope with huge data rebalances 
  without interrupting service.  It's an ongoing process of improving the 
  recovery vs client prioritization, though, and removing sources of 
  overhead related to rebalancing... and it's clearly not perfect yet. :/
 
  sage
 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-14 Thread Andrija Panic
Udo, I had all VMs completely unoperational - so don't set optimal for
now...


On 14 July 2014 20:48, Udo Lembke ulem...@polarzone.de wrote:

 Hi,
 which values are all changed with ceph osd crush tunables optimal?

 Is it perhaps possible to change some parameter the weekends before the
 upgrade is running, to have more time?
 (depends if the parameter are available in 0.72...).

 The warning told, it's can take days... we have an cluster with 5
 storage node and 12 4TB-osd-disk each (60 osd), replica 2. The cluster
 is 60% filled.
 Networkconnection 10Gb.
 Takes tunables optimal in such an configuration one, two or more days?

 Udo

 On 14.07.2014 18:18, Sage Weil wrote:
  I've added some additional notes/warnings to the upgrade and release
  notes:
 
 
 https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451
 
  If there is somewhere else where you think a warning flag would be
 useful,
  let me know!
 
  Generally speaking, we want to be able to cope with huge data rebalances
  without interrupting service.  It's an ongoing process of improving the
  recovery vs client prioritization, though, and removing sources of
  overhead related to rebalancing... and it's clearly not perfect yet. :/
 
  sage
 
 
 

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com