Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

2013-01-14 Thread Alejandro Comisario
Chuck / John.
We are having 50.000 request per minute ( where 10.000+ are put from small
objects, from 10KB to 150KB )

We are using swift 1.7.4 with keystone token caching so no latency over
there.
We are having 12 proxyes and 24 datanodes divided in 4 zones ( each
datanode has 48gb of ram, 2 hexacore and 4 devices of 3TB each )

The workers that are puting objects in swift are seeing an awful
performance, and we too.
With peaks of 2secs to 15secs per put operations coming from the datanodes.
We tunes db_preallocation, disable_fallocate, workers and concurrency but
we cant reach the request that we need ( we need 24.000 put per minute of
small objects ) but we dont seem to find where is the problem, other than
from the datanodes.

Maybe worth pasting our config over here?
Thanks in advance.

alejandro
On 12 Jan 2013 02:01, Chuck Thier cth...@gmail.com wrote:

 Looking at this from a different perspective.  Having 2500 partitions
 per drive shouldn't be an absolutely horrible thing either.  Do you
 know how many objects you have per partition?  What types of problems
 are you seeing?

 --
 Chuck

 On Fri, Jan 11, 2013 at 3:28 PM, John Dickinson m...@not.mn wrote:
  If effect, this would be a complete replacement of your rings, and that
 is essentially a whole new cluster. All of the existing data would need to
 be rehashed into the new ring before it is available.
 
  There is no process that rehashes the data to ensure that it is still in
 the correct partition. Replication only ensures that the partitions are on
 the right drives.
 
  To change the number of partitions, you will need to GET all of the data
 from the old ring and PUT it to the new ring. A more complicated, but
 perhaps more efficient) solution may include something like walking each
 drive and rehashing+moving the data to the right partition and then letting
 replication settle it down.
 
  Either way, 100% of your existing data will need to at least be rehashed
 (and probably moved). Your CPU (hashing), disks (read+write), RAM
 (directory walking), and network (replication) may all be limiting factors
 in how long it will take to do this. Your per-disk free space may also
 determine what method you choose.
 
  I would not expect any data loss while doing this, but you will probably
 have availability issues, depending on the data access patterns.
 
  I'd like to eventually see something in swift that allows for changing
 the partition power in existing rings, but that will be
 hard/tricky/non-trivial.
 
  Good luck.
 
  --John
 
 
  On Jan 11, 2013, at 1:17 PM, Alejandro Comisario 
 alejandro.comisa...@mercadolibre.com wrote:
 
  Hi guys.
  We've created a swift cluster several months ago, the things is that
 righ now we cant add hardware and we configured lots of partitions thinking
 about the final picture of the cluster.
 
  Today each datanodes is having 2500+ partitions per device, and even
 tuning the background processes ( replicator, auditor  updater ) we really
 want to try to lower the partition power.
 
  Since its not possible to do that without recreating the ring, we can
 have the luxury of recreate it with a very lower partition power, and
 rebalance / deploy the new ring.
 
  The question is, having a working cluster with *existing data* is it
 possible to do this and wait for the data to move around *without data
 loss* ???
  If so, it might be true to wait for an improvement in the overall
 cluster performance ?
 
  We have no problem to have a non working cluster (while moving the
 data) even for an entire weekend.
 
  Cheers.
 
 
 
 
  ___
  Mailing list: https://launchpad.net/~openstack
  Post to : openstack@lists.launchpad.net
  Unsubscribe : https://launchpad.net/~openstack
  More help   : https://help.launchpad.net/ListHelp
 

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

2013-01-14 Thread Tong Li

John and swifters,
I see this problem as a big problem and I think that the scenario
described by Alejandro is a very common scenario. I am thinking if it is
possible to have like two rings (one with the newer extended power, one
with the existing ring power), when significant changes made to the
hardware, partition, a new ring get started with a command, and new data
into Swift will use the new ring, and existing data on the existing ring
still available and slowly (not impact the normal use) but automatically
moves to the new ring, once the existing ring shrinks to the size zero,
then that ring can be removed. The idea is to sort of having two virtual
Swift systems working side by side, the migration from existing ring to new
ring being done without interrupting the service. Can we put this
topic/feature as one to be discussed during the next summit and to be
considered as a high priority feature to work on for coming releases?

Thanks.

Tong Li
Emerging Technologies  Standards
Building 501/B205
liton...@us.ibm.com



From:   John Dickinson m...@not.mn
To: Alejandro Comisario alejandro.comisa...@mercadolibre.com,
Cc: openstack-operat...@lists.openstack.org
openstack-operat...@lists.openstack.org, openstack
openstack@lists.launchpad.net
Date:   01/11/2013 04:28 PM
Subject:Re: [Openstack] [SWIFT] Change the partition power to recreate
the RING
Sent by:openstack-bounces+litong01=us.ibm@lists.launchpad.net



If effect, this would be a complete replacement of your rings, and that is
essentially a whole new cluster. All of the existing data would need to be
rehashed into the new ring before it is available.

There is no process that rehashes the data to ensure that it is still in
the correct partition. Replication only ensures that the partitions are on
the right drives.

To change the number of partitions, you will need to GET all of the data
from the old ring and PUT it to the new ring. A more complicated, but
perhaps more efficient) solution may include something like walking each
drive and rehashing+moving the data to the right partition and then letting
replication settle it down.

Either way, 100% of your existing data will need to at least be rehashed
(and probably moved). Your CPU (hashing), disks (read+write), RAM
(directory walking), and network (replication) may all be limiting factors
in how long it will take to do this. Your per-disk free space may also
determine what method you choose.

I would not expect any data loss while doing this, but you will probably
have availability issues, depending on the data access patterns.

I'd like to eventually see something in swift that allows for changing the
partition power in existing rings, but that will be
hard/tricky/non-trivial.

Good luck.

--John


On Jan 11, 2013, at 1:17 PM, Alejandro Comisario
alejandro.comisa...@mercadolibre.com wrote:

 Hi guys.
 We've created a swift cluster several months ago, the things is that righ
now we cant add hardware and we configured lots of partitions thinking
about the final picture of the cluster.

 Today each datanodes is having 2500+ partitions per device, and even
tuning the background processes ( replicator, auditor  updater ) we really
want to try to lower the partition power.

 Since its not possible to do that without recreating the ring, we can
have the luxury of recreate it with a very lower partition power, and
rebalance / deploy the new ring.

 The question is, having a working cluster with *existing data* is it
possible to do this and wait for the data to move around *without data
loss* ???
 If so, it might be true to wait for an improvement in the overall cluster
performance ?

 We have no problem to have a non working cluster (while moving the data)
even for an entire weekend.

 Cheers.



[attachment smime.p7s deleted by Tong Li/Raleigh/IBM]
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp
inline: graycol.gif___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

2013-01-14 Thread John Dickinson
Yes, I think it would be a great topic for the summit.

--John


On Jan 14, 2013, at 7:54 AM, Tong Li liton...@us.ibm.com wrote:

 John and swifters,
 I see this problem as a big problem and I think that the scenario described 
 by Alejandro is a very common scenario. I am thinking if it is possible to 
 have like two rings (one with the newer extended power, one with the existing 
 ring power), when significant changes made to the hardware, partition, a new 
 ring get started with a command, and new data into Swift will use the new 
 ring, and existing data on the existing ring still available and slowly (not 
 impact the normal use) but automatically moves to the new ring, once the 
 existing ring shrinks to the size zero, then that ring can be removed. The 
 idea is to sort of having two virtual Swift systems working side by side, the 
 migration from existing ring to new ring being done without interrupting the 
 service. Can we put this topic/feature as one to be discussed during the next 
 summit and to be considered as a high priority feature to work on for coming 
 releases?
 
 Thanks.
 
 Tong Li
 Emerging Technologies  Standards
 Building 501/B205
 liton...@us.ibm.com
 
 graycol.gifJohn Dickinson ---01/11/2013 04:28:47 PM---If effect, this would 
 be a complete replacement of your rings, and that is essentially a whole new c
 
 From: John Dickinson m...@not.mn
 To:   Alejandro Comisario alejandro.comisa...@mercadolibre.com, 
 Cc:   openstack-operat...@lists.openstack.org 
 openstack-operat...@lists.openstack.org, openstack 
 openstack@lists.launchpad.net
 Date: 01/11/2013 04:28 PM
 Subject:  Re: [Openstack] [SWIFT] Change the partition power to recreate 
 the  RING
 Sent by:  openstack-bounces+litong01=us.ibm@lists.launchpad.net
 
 
 
 If effect, this would be a complete replacement of your rings, and that is 
 essentially a whole new cluster. All of the existing data would need to be 
 rehashed into the new ring before it is available.
 
 There is no process that rehashes the data to ensure that it is still in the 
 correct partition. Replication only ensures that the partitions are on the 
 right drives.
 
 To change the number of partitions, you will need to GET all of the data from 
 the old ring and PUT it to the new ring. A more complicated, but perhaps more 
 efficient) solution may include something like walking each drive and 
 rehashing+moving the data to the right partition and then letting replication 
 settle it down.
 
 Either way, 100% of your existing data will need to at least be rehashed (and 
 probably moved). Your CPU (hashing), disks (read+write), RAM (directory 
 walking), and network (replication) may all be limiting factors in how long 
 it will take to do this. Your per-disk free space may also determine what 
 method you choose.
 
 I would not expect any data loss while doing this, but you will probably have 
 availability issues, depending on the data access patterns.
 
 I'd like to eventually see something in swift that allows for changing the 
 partition power in existing rings, but that will be hard/tricky/non-trivial.
 
 Good luck.
 
 --John
 
 
 On Jan 11, 2013, at 1:17 PM, Alejandro Comisario 
 alejandro.comisa...@mercadolibre.com wrote:
 
  Hi guys.
  We've created a swift cluster several months ago, the things is that righ 
  now we cant add hardware and we configured lots of partitions thinking 
  about the final picture of the cluster.
  
  Today each datanodes is having 2500+ partitions per device, and even tuning 
  the background processes ( replicator, auditor  updater ) we really want 
  to try to lower the partition power.
  
  Since its not possible to do that without recreating the ring, we can have 
  the luxury of recreate it with a very lower partition power, and rebalance 
  / deploy the new ring.
  
  The question is, having a working cluster with *existing data* is it 
  possible to do this and wait for the data to move around *without data 
  loss* ???
  If so, it might be true to wait for an improvement in the overall cluster 
  performance ?
  
  We have no problem to have a non working cluster (while moving the data) 
  even for an entire weekend.
  
  Cheers.
  
  
 
 [attachment smime.p7s deleted by Tong Li/Raleigh/IBM] 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

2013-01-14 Thread Chuck Thier
Hi Alejandro,

I really doubt that partition size is causing these issues.  It can be
difficult to debug these types of issues without access to the
cluster, but I can think of a couple of things to look at.

1.  Check your disk io usage and io wait on the storage nodes.  If
that seems abnormally high, then that could be one of the sources of
problems.  If this is the case, then the first things that I would
look at are the auditors, as they can use up a lot of disk io if not
properly configured.  I would try turning them off for a bit
(swift-*-auditor) and see if that makes any difference.

2.  Check your network io usage.  You haven't described what type of
network you have going to the proxies, but if they share a single GigE
interface, if my quick calculations are correct, you could be
saturating the network.

3.  Check your CPU usage.  I listed this one last as you have said
that you have already worked at tuning the number of workers (though I
would be interested to hear how many workers you have running for each
service).  The main thing to look for, is to see if all of your
workers are maxed out on CPU, if so, then you may need to bump
workers.

4.  SSL Termination?  Where are you terminating the SSL connection?
If you are terminating SSL in Swift directly with the swift proxy,
then that could also be a source of issue.  This was only meant for
dev and testing, and you should use an SSL terminating load balancer
in front of the swift proxies.

That's what I could think of right off the top of my head.

--
Chuck

On Mon, Jan 14, 2013 at 5:45 AM, Alejandro Comisario
alejandro.comisa...@mercadolibre.com wrote:
 Chuck / John.
 We are having 50.000 request per minute ( where 10.000+ are put from small
 objects, from 10KB to 150KB )

 We are using swift 1.7.4 with keystone token caching so no latency over
 there.
 We are having 12 proxyes and 24 datanodes divided in 4 zones ( each datanode
 has 48gb of ram, 2 hexacore and 4 devices of 3TB each )

 The workers that are puting objects in swift are seeing an awful
 performance, and we too.
 With peaks of 2secs to 15secs per put operations coming from the datanodes.
 We tunes db_preallocation, disable_fallocate, workers and concurrency but we
 cant reach the request that we need ( we need 24.000 put per minute of small
 objects ) but we dont seem to find where is the problem, other than from the
 datanodes.

 Maybe worth pasting our config over here?
 Thanks in advance.

 alejandro

 On 12 Jan 2013 02:01, Chuck Thier cth...@gmail.com wrote:

 Looking at this from a different perspective.  Having 2500 partitions
 per drive shouldn't be an absolutely horrible thing either.  Do you
 know how many objects you have per partition?  What types of problems
 are you seeing?

 --
 Chuck

 On Fri, Jan 11, 2013 at 3:28 PM, John Dickinson m...@not.mn wrote:
  If effect, this would be a complete replacement of your rings, and that
  is essentially a whole new cluster. All of the existing data would need to
  be rehashed into the new ring before it is available.
 
  There is no process that rehashes the data to ensure that it is still in
  the correct partition. Replication only ensures that the partitions are on
  the right drives.
 
  To change the number of partitions, you will need to GET all of the data
  from the old ring and PUT it to the new ring. A more complicated, but
  perhaps more efficient) solution may include something like walking each
  drive and rehashing+moving the data to the right partition and then letting
  replication settle it down.
 
  Either way, 100% of your existing data will need to at least be rehashed
  (and probably moved). Your CPU (hashing), disks (read+write), RAM 
  (directory
  walking), and network (replication) may all be limiting factors in how long
  it will take to do this. Your per-disk free space may also determine what
  method you choose.
 
  I would not expect any data loss while doing this, but you will probably
  have availability issues, depending on the data access patterns.
 
  I'd like to eventually see something in swift that allows for changing
  the partition power in existing rings, but that will be
  hard/tricky/non-trivial.
 
  Good luck.
 
  --John
 
 
  On Jan 11, 2013, at 1:17 PM, Alejandro Comisario
  alejandro.comisa...@mercadolibre.com wrote:
 
  Hi guys.
  We've created a swift cluster several months ago, the things is that
  righ now we cant add hardware and we configured lots of partitions 
  thinking
  about the final picture of the cluster.
 
  Today each datanodes is having 2500+ partitions per device, and even
  tuning the background processes ( replicator, auditor  updater ) we 
  really
  want to try to lower the partition power.
 
  Since its not possible to do that without recreating the ring, we can
  have the luxury of recreate it with a very lower partition power, and
  rebalance / deploy the new ring.
 
  The question is, having a working cluster with *existing data* is it
  possible to do 

Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

2013-01-14 Thread Alejandro Comisario
Chuck et All.

Let me go through the point one by one.

#1 Even seeing that object-auditor allways runs and never stops, we
stoped the swift-*-auditor and didnt see any improvements, from all the
datanodes we have an average of 8% IO-WAIT (using iostat), the only thing
that we see is the pid xfsbuf runs once in a while causing 99% iowait for
a sec, we delayed the runtime for that process, and didnt see changes
either.

Our object-auditor config for all devices is as follow :

[object-auditor]
files_per_second = 5
zero_byte_files_per_second = 5
bytes_per_second = 300

#2 Our 12 proxyes are 6 physical and 6 kvm instances running on nova,
checking iftop we are at an average of 15Mb/s of bandwidth usage so i dont
think we are saturating the networking.
#3 The overall Idle CPU on all datanodes is 80%, im not sure how to check
the CPU usage per worker, let me paste the config for a device for object,
account and container.

*object-server.conf*
*--*
[DEFAULT]
devices = /srv/node/sda3
mount_check = false
bind_port = 6010
user = swift
log_facility = LOG_LOCAL2
log_level = DEBUG
workers = 48
disable_fallocate = true

[pipeline:main]
pipeline = object-server

[app:object-server]
use = egg:swift#object

[object-replicator]
vm_test_mode = yes
concurrency = 8
run_pause = 600

[object-updater]
concurrency = 8

[object-auditor]
files_per_second = 5
zero_byte_files_per_second = 5
bytes_per_second = 300

*account-server.conf*
*---*
[DEFAULT]
devices = /srv/node/sda3
mount_check = false
bind_port = 6012
user = swift
log_facility = LOG_LOCAL2
log_level = DEBUG
workers = 48
db_preallocation = on
disable_fallocate = true

[pipeline:main]
pipeline = account-server

[app:account-server]
use = egg:swift#account

[account-replicator]
vm_test_mode = yes
concurrency = 8
run_pause = 600

[account-auditor]

[account-reaper]

*container-server.conf*
*-*
[DEFAULT]
devices = /srv/node/sda3
mount_check = false
bind_port = 6011
user = swift
workers = 48
log_facility = LOG_LOCAL2
allow_versions = True
disable_fallocate = true

[pipeline:main]
pipeline = container-server

[app:container-server]
use = egg:swift#container
allow_versions = True

[container-replicator]
vm_test_mode = yes
concurrency = 8
run_pause = 500

[container-updater]
concurrency = 8

[container-auditor]

#4 We dont use SSL for swift so, no latency over there.

Hope you guys can shed some light.


*
*
*
*
*Alejandro Comisario
#melicloud CloudBuilders*
Arias 3751, Piso 7 (C1430CRG)
Ciudad de Buenos Aires - Argentina
Cel: +549(11) 15-3770-1857
Tel : +54(11) 4640-8443


On Mon, Jan 14, 2013 at 1:23 PM, Chuck Thier cth...@gmail.com wrote:

 Hi Alejandro,

 I really doubt that partition size is causing these issues.  It can be
 difficult to debug these types of issues without access to the
 cluster, but I can think of a couple of things to look at.

 1.  Check your disk io usage and io wait on the storage nodes.  If
 that seems abnormally high, then that could be one of the sources of
 problems.  If this is the case, then the first things that I would
 look at are the auditors, as they can use up a lot of disk io if not
 properly configured.  I would try turning them off for a bit
 (swift-*-auditor) and see if that makes any difference.

 2.  Check your network io usage.  You haven't described what type of
 network you have going to the proxies, but if they share a single GigE
 interface, if my quick calculations are correct, you could be
 saturating the network.

 3.  Check your CPU usage.  I listed this one last as you have said
 that you have already worked at tuning the number of workers (though I
 would be interested to hear how many workers you have running for each
 service).  The main thing to look for, is to see if all of your
 workers are maxed out on CPU, if so, then you may need to bump
 workers.

 4.  SSL Termination?  Where are you terminating the SSL connection?
 If you are terminating SSL in Swift directly with the swift proxy,
 then that could also be a source of issue.  This was only meant for
 dev and testing, and you should use an SSL terminating load balancer
 in front of the swift proxies.

 That's what I could think of right off the top of my head.

 --
 Chuck

 On Mon, Jan 14, 2013 at 5:45 AM, Alejandro Comisario
 alejandro.comisa...@mercadolibre.com wrote:
  Chuck / John.
  We are having 50.000 request per minute ( where 10.000+ are put from
 small
  objects, from 10KB to 150KB )
 
  We are using swift 1.7.4 with keystone token caching so no latency over
  there.
  We are having 12 proxyes and 24 datanodes divided in 4 zones ( each
 datanode
  has 48gb of ram, 2 hexacore and 4 devices of 3TB each )
 
  The workers that are puting objects in swift are seeing an awful
  performance, and we too.
  With peaks of 2secs to 15secs per put operations coming from the
 datanodes.
  We tunes db_preallocation, disable_fallocate, workers and concurrency
 but we
  cant reach the request that we need ( we 

Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

2013-01-14 Thread Chuck Thier
Hey Alejandro,

Those were the most common issues that people run into when they are having
performance issues with swift.  The other thing to check is to look at the
logs to make sure there are no major issues (like bad drives, misconfigured
nodes, etc.), which could add latency to the requests.  After that, I'm
starting to run out of the common issues that people run into, and it might
be worth contracting with one of the many swift consulting companies to
help you out.  If you have time, and can hop on #openstack-swift on
freenode IRC we might be able to have a little more interactive discussion,
or some other may come up with some ideas.

--
Chuck


On Mon, Jan 14, 2013 at 2:01 PM, Alejandro Comisario 
alejandro.comisa...@mercadolibre.com wrote:

 Chuck et All.

 Let me go through the point one by one.

 #1 Even seeing that object-auditor allways runs and never stops, we
 stoped the swift-*-auditor and didnt see any improvements, from all the
 datanodes we have an average of 8% IO-WAIT (using iostat), the only thing
 that we see is the pid xfsbuf runs once in a while causing 99% iowait for
 a sec, we delayed the runtime for that process, and didnt see changes
 either.

 Our object-auditor config for all devices is as follow :

 [object-auditor]
 files_per_second = 5
 zero_byte_files_per_second = 5
 bytes_per_second = 300

 #2 Our 12 proxyes are 6 physical and 6 kvm instances running on nova,
 checking iftop we are at an average of 15Mb/s of bandwidth usage so i dont
 think we are saturating the networking.
 #3 The overall Idle CPU on all datanodes is 80%, im not sure how to check
 the CPU usage per worker, let me paste the config for a device for object,
 account and container.

 *object-server.conf*
 *--*
 [DEFAULT]
 devices = /srv/node/sda3
 mount_check = false
 bind_port = 6010
 user = swift
 log_facility = LOG_LOCAL2
 log_level = DEBUG
 workers = 48
 disable_fallocate = true

 [pipeline:main]
 pipeline = object-server

 [app:object-server]
 use = egg:swift#object

 [object-replicator]
 vm_test_mode = yes
 concurrency = 8
 run_pause = 600

 [object-updater]
 concurrency = 8

 [object-auditor]
 files_per_second = 5
 zero_byte_files_per_second = 5
 bytes_per_second = 300

 *account-server.conf*
 *---*
 [DEFAULT]
 devices = /srv/node/sda3
 mount_check = false
 bind_port = 6012
 user = swift
 log_facility = LOG_LOCAL2
 log_level = DEBUG
 workers = 48
 db_preallocation = on
 disable_fallocate = true

 [pipeline:main]
 pipeline = account-server

 [app:account-server]
 use = egg:swift#account

 [account-replicator]
 vm_test_mode = yes
 concurrency = 8
 run_pause = 600

 [account-auditor]

 [account-reaper]

 *container-server.conf*
 *-*
 [DEFAULT]
 devices = /srv/node/sda3
 mount_check = false
 bind_port = 6011
 user = swift
 workers = 48
 log_facility = LOG_LOCAL2
 allow_versions = True
 disable_fallocate = true

 [pipeline:main]
 pipeline = container-server

 [app:container-server]
 use = egg:swift#container
 allow_versions = True

 [container-replicator]
 vm_test_mode = yes
 concurrency = 8
 run_pause = 500

 [container-updater]
 concurrency = 8

 [container-auditor]

 #4 We dont use SSL for swift so, no latency over there.

 Hope you guys can shed some light.


 *
 *
 *
 *
 *Alejandro Comisario
 #melicloud CloudBuilders*
 Arias 3751, Piso 7 (C1430CRG)
 Ciudad de Buenos Aires - Argentina
 Cel: +549(11) 15-3770-1857
 Tel : +54(11) 4640-8443


 On Mon, Jan 14, 2013 at 1:23 PM, Chuck Thier cth...@gmail.com wrote:

 Hi Alejandro,

 I really doubt that partition size is causing these issues.  It can be
 difficult to debug these types of issues without access to the
 cluster, but I can think of a couple of things to look at.

 1.  Check your disk io usage and io wait on the storage nodes.  If
 that seems abnormally high, then that could be one of the sources of
 problems.  If this is the case, then the first things that I would
 look at are the auditors, as they can use up a lot of disk io if not
 properly configured.  I would try turning them off for a bit
 (swift-*-auditor) and see if that makes any difference.

 2.  Check your network io usage.  You haven't described what type of
 network you have going to the proxies, but if they share a single GigE
 interface, if my quick calculations are correct, you could be
 saturating the network.

 3.  Check your CPU usage.  I listed this one last as you have said
 that you have already worked at tuning the number of workers (though I
 would be interested to hear how many workers you have running for each
 service).  The main thing to look for, is to see if all of your
 workers are maxed out on CPU, if so, then you may need to bump
 workers.

 4.  SSL Termination?  Where are you terminating the SSL connection?
 If you are terminating SSL in Swift directly with the swift proxy,
 then that could also be a source of issue.  This was only meant for
 dev and testing, and you should use an SSL terminating load 

Re: [Openstack] [SWIFT] Change the partition power to recreate the RING

2013-01-11 Thread John Dickinson
If effect, this would be a complete replacement of your rings, and that is 
essentially a whole new cluster. All of the existing data would need to be 
rehashed into the new ring before it is available.

There is no process that rehashes the data to ensure that it is still in the 
correct partition. Replication only ensures that the partitions are on the 
right drives.

To change the number of partitions, you will need to GET all of the data from 
the old ring and PUT it to the new ring. A more complicated, but perhaps more 
efficient) solution may include something like walking each drive and 
rehashing+moving the data to the right partition and then letting replication 
settle it down.

Either way, 100% of your existing data will need to at least be rehashed (and 
probably moved). Your CPU (hashing), disks (read+write), RAM (directory 
walking), and network (replication) may all be limiting factors in how long it 
will take to do this. Your per-disk free space may also determine what method 
you choose.

I would not expect any data loss while doing this, but you will probably have 
availability issues, depending on the data access patterns.

I'd like to eventually see something in swift that allows for changing the 
partition power in existing rings, but that will be hard/tricky/non-trivial.

Good luck.

--John


On Jan 11, 2013, at 1:17 PM, Alejandro Comisario 
alejandro.comisa...@mercadolibre.com wrote:

 Hi guys.
 We've created a swift cluster several months ago, the things is that righ now 
 we cant add hardware and we configured lots of partitions thinking about the 
 final picture of the cluster.
 
 Today each datanodes is having 2500+ partitions per device, and even tuning 
 the background processes ( replicator, auditor  updater ) we really want to 
 try to lower the partition power.
 
 Since its not possible to do that without recreating the ring, we can have 
 the luxury of recreate it with a very lower partition power, and rebalance / 
 deploy the new ring.
 
 The question is, having a working cluster with *existing data* is it possible 
 to do this and wait for the data to move around *without data loss* ???
 If so, it might be true to wait for an improvement in the overall cluster 
 performance ?
 
 We have no problem to have a non working cluster (while moving the data) even 
 for an entire weekend.
 
 Cheers.
 
 



smime.p7s
Description: S/MIME cryptographic signature
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp