subject:"\[Openstack\] \[SWIFT\] Change the partition power to recreate the RING"

Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

2013-01-14 Thread Chuck Thier

Hey Alejandro,

Those were the most common issues that people run into when they are having
performance issues with swift.  The other thing to check is to look at the
logs to make sure there are no major issues (like bad drives, misconfigured
nodes, etc.), which could add latency to the requests.  After that, I'm
starting to run out of the common issues that people run into, and it might
be worth contracting with one of the many swift consulting companies to
help you out.  If you have time, and can hop on #openstack-swift on
freenode IRC we might be able to have a little more interactive discussion,
or some other may come up with some ideas.

--
Chuck


On Mon, Jan 14, 2013 at 2:01 PM, Alejandro Comisario <
alejandro.comisa...@mercadolibre.com> wrote:

> Chuck et All.
>
> Let me go through the point one by one.
>
> #1 Even seeing that "object-auditor" allways runs and never stops, we
> stoped the swift-*-auditor and didnt see any improvements, from all the
> datanodes we have an average of 8% IO-WAIT (using iostat), the only thing
> that we see is the pid "xfsbuf" runs once in a while causing 99% iowait for
> a sec, we delayed the runtime for that process, and didnt see changes
> either.
>
> Our object-auditor config for all devices is as follow :
>
> [object-auditor]
> files_per_second = 5
> zero_byte_files_per_second = 5
> bytes_per_second = 300
>
> #2 Our 12 proxyes are 6 physical and 6 kvm instances running on nova,
> checking iftop we are at an average of 15Mb/s of bandwidth usage so i dont
> think we are saturating the networking.
> #3 The overall Idle CPU on all datanodes is 80%, im not sure how to check
> the CPU usage per worker, let me paste the config for a device for object,
> account and container.
>
> *object-server.conf*
> *--*
> [DEFAULT]
> devices = /srv/node/sda3
> mount_check = false
> bind_port = 6010
> user = swift
> log_facility = LOG_LOCAL2
> log_level = DEBUG
> workers = 48
> disable_fallocate = true
>
> [pipeline:main]
> pipeline = object-server
>
> [app:object-server]
> use = egg:swift#object
>
> [object-replicator]
> vm_test_mode = yes
> concurrency = 8
> run_pause = 600
>
> [object-updater]
> concurrency = 8
>
> [object-auditor]
> files_per_second = 5
> zero_byte_files_per_second = 5
> bytes_per_second = 300
>
> *account-server.conf*
> *---*
> [DEFAULT]
> devices = /srv/node/sda3
> mount_check = false
> bind_port = 6012
> user = swift
> log_facility = LOG_LOCAL2
> log_level = DEBUG
> workers = 48
> db_preallocation = on
> disable_fallocate = true
>
> [pipeline:main]
> pipeline = account-server
>
> [app:account-server]
> use = egg:swift#account
>
> [account-replicator]
> vm_test_mode = yes
> concurrency = 8
> run_pause = 600
>
> [account-auditor]
>
> [account-reaper]
>
> *container-server.conf*
> *-*
> [DEFAULT]
> devices = /srv/node/sda3
> mount_check = false
> bind_port = 6011
> user = swift
> workers = 48
> log_facility = LOG_LOCAL2
> allow_versions = True
> disable_fallocate = true
>
> [pipeline:main]
> pipeline = container-server
>
> [app:container-server]
> use = egg:swift#container
> allow_versions = True
>
> [container-replicator]
> vm_test_mode = yes
> concurrency = 8
> run_pause = 500
>
> [container-updater]
> concurrency = 8
>
> [container-auditor]
>
> #4 We dont use SSL for swift so, no latency over there.
>
> Hope you guys can shed some light.
>
>
> *
> *
> *
> *
> *Alejandro Comisario
> #melicloud CloudBuilders*
> Arias 3751, Piso 7 (C1430CRG)
> Ciudad de Buenos Aires - Argentina
> Cel: +549(11) 15-3770-1857
> Tel : +54(11) 4640-8443
>
>
> On Mon, Jan 14, 2013 at 1:23 PM, Chuck Thier  wrote:
>
>> Hi Alejandro,
>>
>> I really doubt that partition size is causing these issues.  It can be
>> difficult to debug these types of issues without access to the
>> cluster, but I can think of a couple of things to look at.
>>
>> 1.  Check your disk io usage and io wait on the storage nodes.  If
>> that seems abnormally high, then that could be one of the sources of
>> problems.  If this is the case, then the first things that I would
>> look at are the auditors, as they can use up a lot of disk io if not
>> properly configured.  I would try turning them off for a bit
>> (swift-*-auditor) and see if that makes any difference.
>>
>> 2.  Check your network io usage.  You haven't described what type of
>> network you have going to the proxies, but if they share a single GigE
>> interface, if my quick calculations are correct, you could be
>> saturating the network.
>>
>> 3.  Check your CPU usage.  I listed this one last as you have said
>> that you have already worked at tuning the number of workers (though I
>> would be interested to hear how many workers you have running for each
>> service).  The main thing to look for, is to see if all of your
>> workers are maxed out on CPU, if so, then you may need to bump
>> workers.
>>
>> 4.  SSL Termination?  Where are you terminating the SSL connection?
>> If you are terminating

Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

2013-01-14 Thread Alejandro Comisario

Chuck et All.

Let me go through the point one by one.

#1 Even seeing that "object-auditor" allways runs and never stops, we
stoped the swift-*-auditor and didnt see any improvements, from all the
datanodes we have an average of 8% IO-WAIT (using iostat), the only thing
that we see is the pid "xfsbuf" runs once in a while causing 99% iowait for
a sec, we delayed the runtime for that process, and didnt see changes
either.

Our object-auditor config for all devices is as follow :

[object-auditor]
files_per_second = 5
zero_byte_files_per_second = 5
bytes_per_second = 300

#2 Our 12 proxyes are 6 physical and 6 kvm instances running on nova,
checking iftop we are at an average of 15Mb/s of bandwidth usage so i dont
think we are saturating the networking.
#3 The overall Idle CPU on all datanodes is 80%, im not sure how to check
the CPU usage per worker, let me paste the config for a device for object,
account and container.

*object-server.conf*
*--*
[DEFAULT]
devices = /srv/node/sda3
mount_check = false
bind_port = 6010
user = swift
log_facility = LOG_LOCAL2
log_level = DEBUG
workers = 48
disable_fallocate = true

[pipeline:main]
pipeline = object-server

[app:object-server]
use = egg:swift#object

[object-replicator]
vm_test_mode = yes
concurrency = 8
run_pause = 600

[object-updater]
concurrency = 8

[object-auditor]
files_per_second = 5
zero_byte_files_per_second = 5
bytes_per_second = 300

*account-server.conf*
*---*
[DEFAULT]
devices = /srv/node/sda3
mount_check = false
bind_port = 6012
user = swift
log_facility = LOG_LOCAL2
log_level = DEBUG
workers = 48
db_preallocation = on
disable_fallocate = true

[pipeline:main]
pipeline = account-server

[app:account-server]
use = egg:swift#account

[account-replicator]
vm_test_mode = yes
concurrency = 8
run_pause = 600

[account-auditor]

[account-reaper]

*container-server.conf*
*-*
[DEFAULT]
devices = /srv/node/sda3
mount_check = false
bind_port = 6011
user = swift
workers = 48
log_facility = LOG_LOCAL2
allow_versions = True
disable_fallocate = true

[pipeline:main]
pipeline = container-server

[app:container-server]
use = egg:swift#container
allow_versions = True

[container-replicator]
vm_test_mode = yes
concurrency = 8
run_pause = 500

[container-updater]
concurrency = 8

[container-auditor]

#4 We dont use SSL for swift so, no latency over there.

Hope you guys can shed some light.


*
*
*
*
*Alejandro Comisario
#melicloud CloudBuilders*
Arias 3751, Piso 7 (C1430CRG)
Ciudad de Buenos Aires - Argentina
Cel: +549(11) 15-3770-1857
Tel : +54(11) 4640-8443


On Mon, Jan 14, 2013 at 1:23 PM, Chuck Thier  wrote:

> Hi Alejandro,
>
> I really doubt that partition size is causing these issues.  It can be
> difficult to debug these types of issues without access to the
> cluster, but I can think of a couple of things to look at.
>
> 1.  Check your disk io usage and io wait on the storage nodes.  If
> that seems abnormally high, then that could be one of the sources of
> problems.  If this is the case, then the first things that I would
> look at are the auditors, as they can use up a lot of disk io if not
> properly configured.  I would try turning them off for a bit
> (swift-*-auditor) and see if that makes any difference.
>
> 2.  Check your network io usage.  You haven't described what type of
> network you have going to the proxies, but if they share a single GigE
> interface, if my quick calculations are correct, you could be
> saturating the network.
>
> 3.  Check your CPU usage.  I listed this one last as you have said
> that you have already worked at tuning the number of workers (though I
> would be interested to hear how many workers you have running for each
> service).  The main thing to look for, is to see if all of your
> workers are maxed out on CPU, if so, then you may need to bump
> workers.
>
> 4.  SSL Termination?  Where are you terminating the SSL connection?
> If you are terminating SSL in Swift directly with the swift proxy,
> then that could also be a source of issue.  This was only meant for
> dev and testing, and you should use an SSL terminating load balancer
> in front of the swift proxies.
>
> That's what I could think of right off the top of my head.
>
> --
> Chuck
>
> On Mon, Jan 14, 2013 at 5:45 AM, Alejandro Comisario
>  wrote:
> > Chuck / John.
> > We are having 50.000 request per minute ( where 10.000+ are put from
> small
> > objects, from 10KB to 150KB )
> >
> > We are using swift 1.7.4 with keystone token caching so no latency over
> > there.
> > We are having 12 proxyes and 24 datanodes divided in 4 zones ( each
> datanode
> > has 48gb of ram, 2 hexacore and 4 devices of 3TB each )
> >
> > The workers that are puting objects in swift are seeing an awful
> > performance, and we too.
> > With peaks of 2secs to 15secs per put operations coming from the
> datanodes.
> > We tunes db_preallocation, disable_fallocate, workers and concurrency
> but we
> > cant reach the request

Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

2013-01-14 Thread Chuck Thier

Hi Alejandro,

I really doubt that partition size is causing these issues.  It can be
difficult to debug these types of issues without access to the
cluster, but I can think of a couple of things to look at.

1.  Check your disk io usage and io wait on the storage nodes.  If
that seems abnormally high, then that could be one of the sources of
problems.  If this is the case, then the first things that I would
look at are the auditors, as they can use up a lot of disk io if not
properly configured.  I would try turning them off for a bit
(swift-*-auditor) and see if that makes any difference.

2.  Check your network io usage.  You haven't described what type of
network you have going to the proxies, but if they share a single GigE
interface, if my quick calculations are correct, you could be
saturating the network.

3.  Check your CPU usage.  I listed this one last as you have said
that you have already worked at tuning the number of workers (though I
would be interested to hear how many workers you have running for each
service).  The main thing to look for, is to see if all of your
workers are maxed out on CPU, if so, then you may need to bump
workers.

4.  SSL Termination?  Where are you terminating the SSL connection?
If you are terminating SSL in Swift directly with the swift proxy,
then that could also be a source of issue.  This was only meant for
dev and testing, and you should use an SSL terminating load balancer
in front of the swift proxies.

That's what I could think of right off the top of my head.

--
Chuck

On Mon, Jan 14, 2013 at 5:45 AM, Alejandro Comisario
 wrote:
> Chuck / John.
> We are having 50.000 request per minute ( where 10.000+ are put from small
> objects, from 10KB to 150KB )
>
> We are using swift 1.7.4 with keystone token caching so no latency over
> there.
> We are having 12 proxyes and 24 datanodes divided in 4 zones ( each datanode
> has 48gb of ram, 2 hexacore and 4 devices of 3TB each )
>
> The workers that are puting objects in swift are seeing an awful
> performance, and we too.
> With peaks of 2secs to 15secs per put operations coming from the datanodes.
> We tunes db_preallocation, disable_fallocate, workers and concurrency but we
> cant reach the request that we need ( we need 24.000 put per minute of small
> objects ) but we dont seem to find where is the problem, other than from the
> datanodes.
>
> Maybe worth pasting our config over here?
> Thanks in advance.
>
> alejandro
>
> On 12 Jan 2013 02:01, "Chuck Thier"  wrote:
>>
>> Looking at this from a different perspective.  Having 2500 partitions
>> per drive shouldn't be an absolutely horrible thing either.  Do you
>> know how many objects you have per partition?  What types of problems
>> are you seeing?
>>
>> --
>> Chuck
>>
>> On Fri, Jan 11, 2013 at 3:28 PM, John Dickinson  wrote:
>> > If effect, this would be a complete replacement of your rings, and that
>> > is essentially a whole new cluster. All of the existing data would need to
>> > be rehashed into the new ring before it is available.
>> >
>> > There is no process that rehashes the data to ensure that it is still in
>> > the correct partition. Replication only ensures that the partitions are on
>> > the right drives.
>> >
>> > To change the number of partitions, you will need to GET all of the data
>> > from the old ring and PUT it to the new ring. A more complicated, but
>> > perhaps more efficient) solution may include something like walking each
>> > drive and rehashing+moving the data to the right partition and then letting
>> > replication settle it down.
>> >
>> > Either way, 100% of your existing data will need to at least be rehashed
>> > (and probably moved). Your CPU (hashing), disks (read+write), RAM 
>> > (directory
>> > walking), and network (replication) may all be limiting factors in how long
>> > it will take to do this. Your per-disk free space may also determine what
>> > method you choose.
>> >
>> > I would not expect any data loss while doing this, but you will probably
>> > have availability issues, depending on the data access patterns.
>> >
>> > I'd like to eventually see something in swift that allows for changing
>> > the partition power in existing rings, but that will be
>> > hard/tricky/non-trivial.
>> >
>> > Good luck.
>> >
>> > --John
>> >
>> >
>> > On Jan 11, 2013, at 1:17 PM, Alejandro Comisario
>> >  wrote:
>> >
>> >> Hi guys.
>> >> We've created a swift cluster several months ago, the things is that
>> >> righ now we cant add hardware and we configured lots of partitions 
>> >> thinking
>> >> about the final picture of the cluster.
>> >>
>> >> Today each datanodes is having 2500+ partitions per device, and even
>> >> tuning the background processes ( replicator, auditor & updater ) we 
>> >> really
>> >> want to try to lower the partition power.
>> >>
>> >> Since its not possible to do that without recreating the ring, we can
>> >> have the luxury of recreate it with a very lower partition power, and
>> >> rebalance / de

Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

2013-01-14 Thread John Dickinson

Yes, I think it would be a great topic for the summit.

--John


On Jan 14, 2013, at 7:54 AM, Tong Li  wrote:

> John and swifters,
> I see this problem as a big problem and I think that the scenario described 
> by Alejandro is a very common scenario. I am thinking if it is possible to 
> have like two rings (one with the newer extended power, one with the existing 
> ring power), when significant changes made to the hardware, partition, a new 
> ring get started with a command, and new data into Swift will use the new 
> ring, and existing data on the existing ring still available and slowly (not 
> impact the normal use) but automatically moves to the new ring, once the 
> existing ring shrinks to the size zero, then that ring can be removed. The 
> idea is to sort of having two virtual Swift systems working side by side, the 
> migration from existing ring to new ring being done without interrupting the 
> service. Can we put this topic/feature as one to be discussed during the next 
> summit and to be considered as a high priority feature to work on for coming 
> releases?
> 
> Thanks.
> 
> Tong Li
> Emerging Technologies & Standards
> Building 501/B205
> liton...@us.ibm.com
> 
> John Dickinson ---01/11/2013 04:28:47 PM---If effect, this would 
> be a complete replacement of your rings, and that is essentially a whole new c
> 
> From: John Dickinson 
> To:   Alejandro Comisario , 
> Cc:   "openstack-operat...@lists.openstack.org" 
> , openstack 
> 
> Date: 01/11/2013 04:28 PM
> Subject:  Re: [Openstack] [SWIFT] Change the partition power to recreate 
> the  RING
> Sent by:  openstack-bounces+litong01=us.ibm@lists.launchpad.net
> 
> 
> 
> If effect, this would be a complete replacement of your rings, and that is 
> essentially a whole new cluster. All of the existing data would need to be 
> rehashed into the new ring before it is available.
> 
> There is no process that rehashes the data to ensure that it is still in the 
> correct partition. Replication only ensures that the partitions are on the 
> right drives.
> 
> To change the number of partitions, you will need to GET all of the data from 
> the old ring and PUT it to the new ring. A more complicated, but perhaps more 
> efficient) solution may include something like walking each drive and 
> rehashing+moving the data to the right partition and then letting replication 
> settle it down.
> 
> Either way, 100% of your existing data will need to at least be rehashed (and 
> probably moved). Your CPU (hashing), disks (read+write), RAM (directory 
> walking), and network (replication) may all be limiting factors in how long 
> it will take to do this. Your per-disk free space may also determine what 
> method you choose.
> 
> I would not expect any data loss while doing this, but you will probably have 
> availability issues, depending on the data access patterns.
> 
> I'd like to eventually see something in swift that allows for changing the 
> partition power in existing rings, but that will be hard/tricky/non-trivial.
> 
> Good luck.
> 
> --John
> 
> 
> On Jan 11, 2013, at 1:17 PM, Alejandro Comisario 
>  wrote:
> 
> > Hi guys.
> > We've created a swift cluster several months ago, the things is that righ 
> > now we cant add hardware and we configured lots of partitions thinking 
> > about the final picture of the cluster.
> > 
> > Today each datanodes is having 2500+ partitions per device, and even tuning 
> > the background processes ( replicator, auditor & updater ) we really want 
> > to try to lower the partition power.
> > 
> > Since its not possible to do that without recreating the ring, we can have 
> > the luxury of recreate it with a very lower partition power, and rebalance 
> > / deploy the new ring.
> > 
> > The question is, having a working cluster with *existing data* is it 
> > possible to do this and wait for the data to move around *without data 
> > loss* ???
> > If so, it might be true to wait for an improvement in the overall cluster 
> > performance ?
> > 
> > We have no problem to have a non working cluster (while moving the data) 
> > even for an entire weekend.
> > 
> > Cheers.
> > 
> > 
> 
> [attachment "smime.p7s" deleted by Tong Li/Raleigh/IBM] 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
> 
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

2013-01-14 Thread Tong Li

John and swifters,
I see this problem as a big problem and I think that the scenario
described by Alejandro is a very common scenario. I am thinking if it is
possible to have like two rings (one with the newer extended power, one
with the existing ring power), when significant changes made to the
hardware, partition, a new ring get started with a command, and new data
into Swift will use the new ring, and existing data on the existing ring
still available and slowly (not impact the normal use) but automatically
moves to the new ring, once the existing ring shrinks to the size zero,
then that ring can be removed. The idea is to sort of having two virtual
Swift systems working side by side, the migration from existing ring to new
ring being done without interrupting the service. Can we put this
topic/feature as one to be discussed during the next summit and to be
considered as a high priority feature to work on for coming releases?

Thanks.

Tong Li
Emerging Technologies & Standards
Building 501/B205
liton...@us.ibm.com

From:   John Dickinson 
To: Alejandro Comisario ,
Cc: "openstack-operat...@lists.openstack.org"
, openstack

Date:   01/11/2013 04:28 PM
Subject:Re: [Openstack] [SWIFT] Change the partition power to recreate
    the RING
Sent by:openstack-bounces+litong01=us.ibm@lists.launchpad.net

If effect, this would be a complete replacement of your rings, and that is
essentially a whole new cluster. All of the existing data would need to be
rehashed into the new ring before it is available.

There is no process that rehashes the data to ensure that it is still in
the correct partition. Replication only ensures that the partitions are on
the right drives.

To change the number of partitions, you will need to GET all of the data
from the old ring and PUT it to the new ring. A more complicated, but
perhaps more efficient) solution may include something like walking each
drive and rehashing+moving the data to the right partition and then letting
replication settle it down.

Either way, 100% of your existing data will need to at least be rehashed
(and probably moved). Your CPU (hashing), disks (read+write), RAM
(directory walking), and network (replication) may all be limiting factors
in how long it will take to do this. Your per-disk free space may also
determine what method you choose.

I would not expect any data loss while doing this, but you will probably
have availability issues, depending on the data access patterns.

I'd like to eventually see something in swift that allows for changing the
partition power in existing rings, but that will be
hard/tricky/non-trivial.

Good luck.

--John

On Jan 11, 2013, at 1:17 PM, Alejandro Comisario
 wrote:

> Hi guys.
> We've created a swift cluster several months ago, the things is that righ
now we cant add hardware and we configured lots of partitions thinking
about the final picture of the cluster.
>
> Today each datanodes is having 2500+ partitions per device, and even
tuning the background processes ( replicator, auditor & updater ) we really
want to try to lower the partition power.
>
> Since its not possible to do that without recreating the ring, we can
have the luxury of recreate it with a very lower partition power, and
rebalance / deploy the new ring.
>
> The question is, having a working cluster with *existing data* is it
possible to do this and wait for the data to move around *without data
loss* ???
> If so, it might be true to wait for an improvement in the overall cluster
performance ?
>
> We have no problem to have a non working cluster (while moving the data)
even for an entire weekend.
>
> Cheers.
>
>

[attachment "smime.p7s" deleted by Tong Li/Raleigh/IBM]
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp
<>___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

2013-01-14 Thread Alejandro Comisario

Chuck / John.
We are having 50.000 request per minute ( where 10.000+ are put from small
objects, from 10KB to 150KB )

We are using swift 1.7.4 with keystone token caching so no latency over
there.
We are having 12 proxyes and 24 datanodes divided in 4 zones ( each
datanode has 48gb of ram, 2 hexacore and 4 devices of 3TB each )

The workers that are puting objects in swift are seeing an awful
performance, and we too.
With peaks of 2secs to 15secs per put operations coming from the datanodes.
We tunes db_preallocation, disable_fallocate, workers and concurrency but
we cant reach the request that we need ( we need 24.000 put per minute of
small objects ) but we dont seem to find where is the problem, other than
from the datanodes.

Maybe worth pasting our config over here?
Thanks in advance.

alejandro
On 12 Jan 2013 02:01, "Chuck Thier"  wrote:

> Looking at this from a different perspective.  Having 2500 partitions
> per drive shouldn't be an absolutely horrible thing either.  Do you
> know how many objects you have per partition?  What types of problems
> are you seeing?
>
> --
> Chuck
>
> On Fri, Jan 11, 2013 at 3:28 PM, John Dickinson  wrote:
> > If effect, this would be a complete replacement of your rings, and that
> is essentially a whole new cluster. All of the existing data would need to
> be rehashed into the new ring before it is available.
> >
> > There is no process that rehashes the data to ensure that it is still in
> the correct partition. Replication only ensures that the partitions are on
> the right drives.
> >
> > To change the number of partitions, you will need to GET all of the data
> from the old ring and PUT it to the new ring. A more complicated, but
> perhaps more efficient) solution may include something like walking each
> drive and rehashing+moving the data to the right partition and then letting
> replication settle it down.
> >
> > Either way, 100% of your existing data will need to at least be rehashed
> (and probably moved). Your CPU (hashing), disks (read+write), RAM
> (directory walking), and network (replication) may all be limiting factors
> in how long it will take to do this. Your per-disk free space may also
> determine what method you choose.
> >
> > I would not expect any data loss while doing this, but you will probably
> have availability issues, depending on the data access patterns.
> >
> > I'd like to eventually see something in swift that allows for changing
> the partition power in existing rings, but that will be
> hard/tricky/non-trivial.
> >
> > Good luck.
> >
> > --John
> >
> >
> > On Jan 11, 2013, at 1:17 PM, Alejandro Comisario <
> alejandro.comisa...@mercadolibre.com> wrote:
> >
> >> Hi guys.
> >> We've created a swift cluster several months ago, the things is that
> righ now we cant add hardware and we configured lots of partitions thinking
> about the final picture of the cluster.
> >>
> >> Today each datanodes is having 2500+ partitions per device, and even
> tuning the background processes ( replicator, auditor & updater ) we really
> want to try to lower the partition power.
> >>
> >> Since its not possible to do that without recreating the ring, we can
> have the luxury of recreate it with a very lower partition power, and
> rebalance / deploy the new ring.
> >>
> >> The question is, having a working cluster with *existing data* is it
> possible to do this and wait for the data to move around *without data
> loss* ???
> >> If so, it might be true to wait for an improvement in the overall
> cluster performance ?
> >>
> >> We have no problem to have a non working cluster (while moving the
> data) even for an entire weekend.
> >>
> >> Cheers.
> >>
> >>
> >
> >
> > ___
> > Mailing list: https://launchpad.net/~openstack
> > Post to : openstack@lists.launchpad.net
> > Unsubscribe : https://launchpad.net/~openstack
> > More help   : https://help.launchpad.net/ListHelp
> >
>
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

2013-01-11 Thread Chuck Thier

Looking at this from a different perspective.  Having 2500 partitions
per drive shouldn't be an absolutely horrible thing either.  Do you
know how many objects you have per partition?  What types of problems
are you seeing?

--
Chuck

On Fri, Jan 11, 2013 at 3:28 PM, John Dickinson  wrote:
> If effect, this would be a complete replacement of your rings, and that is 
> essentially a whole new cluster. All of the existing data would need to be 
> rehashed into the new ring before it is available.
>
> There is no process that rehashes the data to ensure that it is still in the 
> correct partition. Replication only ensures that the partitions are on the 
> right drives.
>
> To change the number of partitions, you will need to GET all of the data from 
> the old ring and PUT it to the new ring. A more complicated, but perhaps more 
> efficient) solution may include something like walking each drive and 
> rehashing+moving the data to the right partition and then letting replication 
> settle it down.
>
> Either way, 100% of your existing data will need to at least be rehashed (and 
> probably moved). Your CPU (hashing), disks (read+write), RAM (directory 
> walking), and network (replication) may all be limiting factors in how long 
> it will take to do this. Your per-disk free space may also determine what 
> method you choose.
>
> I would not expect any data loss while doing this, but you will probably have 
> availability issues, depending on the data access patterns.
>
> I'd like to eventually see something in swift that allows for changing the 
> partition power in existing rings, but that will be hard/tricky/non-trivial.
>
> Good luck.
>
> --John
>
>
> On Jan 11, 2013, at 1:17 PM, Alejandro Comisario 
>  wrote:
>
>> Hi guys.
>> We've created a swift cluster several months ago, the things is that righ 
>> now we cant add hardware and we configured lots of partitions thinking about 
>> the final picture of the cluster.
>>
>> Today each datanodes is having 2500+ partitions per device, and even tuning 
>> the background processes ( replicator, auditor & updater ) we really want to 
>> try to lower the partition power.
>>
>> Since its not possible to do that without recreating the ring, we can have 
>> the luxury of recreate it with a very lower partition power, and rebalance / 
>> deploy the new ring.
>>
>> The question is, having a working cluster with *existing data* is it 
>> possible to do this and wait for the data to move around *without data loss* 
>> ???
>> If so, it might be true to wait for an improvement in the overall cluster 
>> performance ?
>>
>> We have no problem to have a non working cluster (while moving the data) 
>> even for an entire weekend.
>>
>> Cheers.
>>
>>
>
>
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

2013-01-11 Thread John Dickinson

If effect, this would be a complete replacement of your rings, and that is 
essentially a whole new cluster. All of the existing data would need to be 
rehashed into the new ring before it is available.

There is no process that rehashes the data to ensure that it is still in the 
correct partition. Replication only ensures that the partitions are on the 
right drives.

To change the number of partitions, you will need to GET all of the data from 
the old ring and PUT it to the new ring. A more complicated, but perhaps more 
efficient) solution may include something like walking each drive and 
rehashing+moving the data to the right partition and then letting replication 
settle it down.

Either way, 100% of your existing data will need to at least be rehashed (and 
probably moved). Your CPU (hashing), disks (read+write), RAM (directory 
walking), and network (replication) may all be limiting factors in how long it 
will take to do this. Your per-disk free space may also determine what method 
you choose.

I would not expect any data loss while doing this, but you will probably have 
availability issues, depending on the data access patterns.

I'd like to eventually see something in swift that allows for changing the 
partition power in existing rings, but that will be hard/tricky/non-trivial.

Good luck.

--John

On Jan 11, 2013, at 1:17 PM, Alejandro Comisario 
 wrote:

> Hi guys.
> We've created a swift cluster several months ago, the things is that righ now 
> we cant add hardware and we configured lots of partitions thinking about the 
> final picture of the cluster.
> 
> Today each datanodes is having 2500+ partitions per device, and even tuning 
> the background processes ( replicator, auditor & updater ) we really want to 
> try to lower the partition power.
> 
> Since its not possible to do that without recreating the ring, we can have 
> the luxury of recreate it with a very lower partition power, and rebalance / 
> deploy the new ring.
> 
> The question is, having a working cluster with *existing data* is it possible 
> to do this and wait for the data to move around *without data loss* ???
> If so, it might be true to wait for an improvement in the overall cluster 
> performance ?
> 
> We have no problem to have a non working cluster (while moving the data) even 
> for an entire weekend.
> 
> Cheers.
> 
> 

smime.p7s
Description: S/MIME cryptographic signature
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

[Openstack] [SWIFT] Change the partition power to recreate the RING

2013-01-11 Thread Alejandro Comisario

Hi guys.
We've created a swift cluster several months ago, the things is that righ
now we cant add hardware and we configured lots of partitions thinking
about the final picture of the cluster.

Today each datanodes is having 2500+ partitions per device, and even tuning
the background processes ( replicator, auditor & updater ) we really want
to try to lower the partition power.

Since its not possible to do that without recreating the ring, we can have
the luxury of recreate it with a very lower partition power, and rebalance
/ deploy the new ring.

The question is, having a working cluster with *existing data* is it
possible to do this and wait for the data to move around *without data
loss* ???
If so, it might be true to wait for an improvement in the overall cluster
performance ?

We have no problem to have a non working cluster (while moving the data)
even for an entire weekend.

Cheers.
*
*
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

[Openstack] [SWIFT] Change the partition power to recreate the RING

9 matches

Site Navigation

Mail list logo

Footer information