Re: [Openstack] [SWIFT] raising network traffic on the storage node

2013-07-09 Thread Klaus Schürmann
I measured the network traffic with darkstat:

server  In  Out Total

storagenode1699,118,562 679,077,971 1,378,196,533   
storagenode2168,636,360 165,050,575 333,686,935 
storagenode3166,583,442 164,405,402 330,988,844 
storagenode4164,282,250 163,051,416 327,333,666 
storagenode5164,000,162 162,840,370 326,840,532 
proxynode1  7,339,629   31,253,205  38,592,834  
proxynode2  8,236,128   12,517,594  20,753,722

This is a part of traffic to server storagenode3:

PortIn  Out Total   Syns
600021,055,732  347,350,916 368,406,648 47,388
600119,717,608  18,090,656  37,808,264  31,549
6002494,124 316,830 810,954 883
36905   39,660  2,263   41,923  0
44687   33,056  1,944   35,000  0
47388   31,691  2,467   34,158  0
41999   30,626  1,788   32,414  0
34228   26,552  3,345   29,897  0

Is this correct configured?


-Ursprüngliche Nachricht-
Von: Openstack 
[mailto:openstack-bounces+klaus.schuermann=mediabeam@lists.launchpad.net] 
Im Auftrag von Robert van Leeuwen
Gesendet: Dienstag, 9. Juli 2013 09:09
An: openstack@lists.launchpad.net
Betreff: Re: [Openstack] [SWIFT] raising network traffic on the storage node

 If the replication traffic is responsible for this raising network traffic 
 for only 1.200.000 objects, how much traffic I can 
 expect if I have 100.000.000 objects stored?
 The average size of my mailobjects are 120 kB. 
 It's planned to use all 12 hard drive slots of my DELL R720xd with 4  TB 
 drives
. I have 5 storage nodes and 2 balanced proxy nodes. Will the replication 
traffic kill my system?

We are running with   400.000.000 objects  across 11 object storage nodes.
Total network traffic on any of those nodes is less then 10 MByte /second

However we have seen slowdowns with lots of small files and really big disks.
The issue is not related to the network but the local filesystem/disk.
When the inode cache gets insufficient you can see terrible slow-downs.
There have been a few threads about that in this list, having a lot of memory 
usually helps a bit.

Cheers,
Robert van Leeuwen
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [SWIFT] raising network traffic o the storage node

2013-07-08 Thread Klaus Schürmann
On Monday some more mailboxes store their Mails in the objectstorage.
But that only increased the raising.
 
Traffic Storagenode: http://www.schuermann.net/temp/storagenode2.png
Traffic Proxyserver: http://www.schuermann.net/temp/proxyserver2.png


-Ursprüngliche Nachricht-
Von: Peter Portante [mailto:peter.a.porta...@gmail.com] 
Gesendet: Montag, 8. Juli 2013 16:04
An: Klaus Schürmann
Cc: openstack@lists.launchpad.net
Betreff: Re: [Openstack] [SWIFT] raising network traffic o the storage node

Can you zoom in past the spike, most recent 2 or three weeks and see
how it looks?

My guess is that the proxy traffic is also rising.

On Mon, Jul 8, 2013 at 9:50 AM, Klaus Schürmann
klaus.schuerm...@mediabeam.com wrote:
 Hi,

 I use a swift storage as a mail-store. Now I have about  1.000.000 objects
 stored in the cluster.



 I'm wondering about the raising network traffic on my storage nodes. The
 traffic from the proxy-server has a normal characteristic.



 Traffic Storagenode: http://www.schuermann.net/temp/storagenode.png

 Traffic Proxyserver: http://www.schuermann.net/temp/proxyserver.png



 Can someone explain such behavior?



 Thanks

 Klaus


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [SWIFT] raising network traffic on the storage node

2013-07-08 Thread Klaus Schürmann
If the replication traffic is responsible for this raising network traffic for 
only 1.200.000 objects, how much traffic I can expect if I have 100.000.000 
objects stored? 
The average size of my mailobjects are 120 kB. It's planned to use all 12 hard 
drive slots of my DELL R720xd with 4 TB drives. I have 5 storage nodes and 2 
balanced proxy nodes. Will the replication traffic kill my system?

Here is a small part of my object-server-replicator log:
Jul  9 06:48:16 storage-node1 object-replicator Starting object replication 
pass.
Jul  9 06:49:13 storage-node1 object-replicator 9830/9830 (100.00%) partitions 
replicated in 57.56s (170.78/sec, 0s remaining)
Jul  9 06:49:13 storage-node1 object-replicator 1234597 suffixes checked - 
0.00% hashed, 0.00% synced
Jul  9 06:49:13 storage-node1 object-replicator Partition times: max 0.0279s, 
min 0.0068s, med 0.0104s
Jul  9 06:49:13 storage-node1 object-replicator Object replication complete. 
(0.96 minutes)
Jul  9 06:49:43 storage-node1 object-replicator Starting object replication 
pass.
Jul  9 06:50:41 storage-node1 object-replicator 9830/9830 (100.00%) partitions 
replicated in 57.69s (170.39/sec, 0s remaining)
Jul  9 06:50:41 storage-node1 object-replicator 1234643 suffixes checked - 
0.00% hashed, 0.00% synced
Jul  9 06:50:41 storage-node1 object-replicator Partition times: max 0.0365s, 
min 0.0068s, med 0.0104s
Jul  9 06:50:41 storage-node1 object-replicator Object replication complete. 
(0.96 minutes)
Jul  9 06:51:11 storage-node1 object-replicator Starting object replication 
pass.
Jul  9 06:52:09 storage-node1 object-replicator 9830/9830 (100.00%) partitions 
replicated in 58.31s (168.58/sec, 0s remaining)
Jul  9 06:52:09 storage-node1 object-replicator 1234688 suffixes checked - 
0.00% hashed, 0.00% synced
Jul  9 06:52:09 storage-node1 object-replicator Partition times: max 0.0348s, 
min 0.0069s, med 0.0106s
Jul  9 06:52:09 storage-node1 object-replicator Object replication complete. 
(0.97 minutes)

-Ursprüngliche Nachricht-
Von: Pete Zaitcev [mailto:zait...@redhat.com] 
Gesendet: Montag, 8. Juli 2013 19:22
An: Klaus Schürmann
Cc: openstack@lists.launchpad.net
Betreff: Re: [Openstack] [SWIFT] raising network traffic o the storage node

On Mon, 8 Jul 2013 13:50:38 +
Klaus Schürmann klaus.schuerm...@mediabeam.com wrote:

 I use a swift storage as a mail-store. Now I have about  1.000.000 objects 
 stored in the cluster.

 Traffic Storagenode: http://www.schuermann.net/temp/storagenode.png
 Traffic Proxyserver: http://www.schuermann.net/temp/proxyserver.png
 
 Can someone explain such behavior?

At a guess, raising number of objects makes number of partitions to
increase, and that increases the replication traffic, specifically
the number of sent MD5s for partitions.

It would be interesting to corellate the number of objects and
number of non-empty and empty partitions with the amounts of traffic.
If the increasing transfer of hashes is the reason, you could also
verify by graphing the traffic to port 873 separately. Swift never
replicates object bodies through its own HTTP interface, so this
splits control traffic from data traffic for you. Data traffic
should be driven by customer and node failures, not consistency
checking.

Be prepared to split up storage nodes, however. Even if there's no
bug with replication, its aggregate traffic increases with the
increases in object counts.

-- Pete
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Swift performance issues with requests

2013-06-04 Thread Klaus Schürmann
Hi Rick,

I found the problem. I placed a hardware balancer in front of the proxy server. 
The balancer lost some packets because of a faulty network interface.
Your tip was excellent.

Thanks
Klaus

-Ursprüngliche Nachricht-
Von: Rick Jones [mailto:rick.jon...@hp.com] 
Gesendet: Freitag, 31. Mai 2013 19:17
An: Klaus Schürmann
Cc: openstack@lists.launchpad.net
Betreff: Re: [Openstack] Swift performance issues with requests

On 05/31/2013 04:55 AM, Klaus Schürmann wrote:
 May 31 10:33:08 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 
 31/May/2013/08/33/08 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 
 - Wget/1.12%20%28linux-gnu%29 
 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 - 283354 - 
 txd4a3a4bf3f384936a0bc14dbffddd275 - 0.1020 -
 May 31 10:33:26 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 
 31/May/2013/08/33/26 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 
 - Wget/1.12%20%28linux-gnu%29 
 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 - 283354 - 
 txd8c6b34b8e41460bb2c5f3f4b6def0ef - 17.7330 -   

Something I forgot to mention, which was the basis for my TCP 
retransmissions guess.  Depending on your kernel revision, the initial 
TCP retransmission timeout is 3 seconds, and it will double each time - 
eg 3, 6, 12.  As it happens, the cumulative time for that is 17 
seconds...  So, the 17 seconds and change would be consistent with a 
transient problem in establishing a TCP connection.  Of course, it could 
just be a coincidence.

Later kernels - I  forget where in the 3.X stream exactly - have the 
initial retransmission timeout of 1 second.  In that case the timeouts 
would go 1, 2, 4, 8, etc...

rick

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Swift performance issues with requests

2013-05-31 Thread Klaus Schürmann
Hi,

when I test my new swift cluster I get a strange behavior with GET and PUT 
requests.
Most time it is really fast. But sometimes it takes a long time to get the data.
Here is an example with the same request which took one time 17 seconds:

May 31 10:33:08 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 
31/May/2013/08/33/08 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - 
Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 
- 283354 - tx2804381fef91455dabf6c9fd0edf4206 - 0.0546 -
May 31 10:33:08 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 
31/May/2013/08/33/08 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - 
Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 
- 283354 - tx90025e3259d74b9faa8f17efaf85b104 - 0.0516 -
May 31 10:33:08 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 
31/May/2013/08/33/08 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - 
Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 
- 283354 - tx942d79f78ee345138df6cd87bac0f860 - 0.0942 -
May 31 10:33:08 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 
31/May/2013/08/33/08 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - 
Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 
- 283354 - tx73f053e15ed345caad38a6191fe7f196 - 0.0584 -
May 31 10:33:08 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 
31/May/2013/08/33/08 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - 
Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 
- 283354 - txd4a3a4bf3f384936a0bc14dbffddd275 - 0.1020 -
May 31 10:33:26 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 
31/May/2013/08/33/26 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - 
Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 
- 283354 - txd8c6b34b8e41460bb2c5f3f4b6def0ef - 17.7330 -   
May 31 10:33:26 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 
31/May/2013/08/33/26 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - 
Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 
- 283354 - tx21aaa822f8294d9592fe04b3de27c98e - 0.0226 -
May 31 10:33:26 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 
31/May/2013/08/33/26 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - 
Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 
- 283354 - txcabe6adf73f740efb2b82d479a1e6b20 - 0.0385 -
May 31 10:33:26 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 
31/May/2013/08/33/26 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - 
Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 
- 283354 - txc1247a1bb6c04bd3b496b3b986373170 - 0.0247 -
May 31 10:33:26 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 
31/May/2013/08/33/26 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - 
Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 
- 283354 - txdf295a88e513443393992f37785f8aed - 0.0144 -
May 31 10:33:26 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 
31/May/2013/08/33/26 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - 
Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 
- 283354 - tx62bb33e8c20d43b7a4c3512232de6fe4 - 0.0125 -

Alle requests on the storage nodes are below 0.01 sec.

The tested cluster contain one proxy (DELL R420, 16 G RAM, 2 CPU) and 5 
storage-nodes (DELL R720xd, 16 G RAM 2 CPU, 2 HDD). The proxy-server 
configuration:

[DEFAULT]
log_name = proxy-server
log_facility = LOG_LOCAL1
log_level = INFO
log_address = /dev/log
bind_port = 80
user = swift
workers = 32
log_statsd_host = 10.4.100.10
log_statsd_port = 8125
log_statsd_default_sample_rate = 1
log_statsd_metric_prefix = Proxy01
#set log_level = DEBUG

[pipeline:main]
pipeline = healthcheck cache proxy-logging tempauth proxy-server

[app:proxy-server]
use = egg:swift#proxy
allow_account_management = true
account_autocreate = true

[filter:tempauth]
use = egg:swift#tempauth
user_provider1_ =  .xxx http://10.4.100.1/v1/AUTH_provider1
log_name = tempauth
log_facility = LOG_LOCAL2
log_level = INFO
log_address = /dev/log

[filter:cache]
use = egg:swift#memcache
memcache_servers = 10.12.0.2:11211,10.12.0.3:11211
set log_name = cache

[filter:catch_errors]
use = egg:swift#catch_errors

[filter:healthcheck]
use = egg:swift#healthcheck

[filter:proxy-logging]
use = egg:swift#proxy_logging
access_log_name = proxy-logging
access_log_facility = LOG_LOCAL3
access_log_level = DEBUG
access_log_address = /dev/log


Can someone explain such behavior?

Thanks
Klaus

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp