Re: [Openstack] [SWIFT] raising network traffic on the storage node
I measured the network traffic with darkstat: server In Out Total storagenode1699,118,562 679,077,971 1,378,196,533 storagenode2168,636,360 165,050,575 333,686,935 storagenode3166,583,442 164,405,402 330,988,844 storagenode4164,282,250 163,051,416 327,333,666 storagenode5164,000,162 162,840,370 326,840,532 proxynode1 7,339,629 31,253,205 38,592,834 proxynode2 8,236,128 12,517,594 20,753,722 This is a part of traffic to server storagenode3: PortIn Out Total Syns 600021,055,732 347,350,916 368,406,648 47,388 600119,717,608 18,090,656 37,808,264 31,549 6002494,124 316,830 810,954 883 36905 39,660 2,263 41,923 0 44687 33,056 1,944 35,000 0 47388 31,691 2,467 34,158 0 41999 30,626 1,788 32,414 0 34228 26,552 3,345 29,897 0 Is this correct configured? -Ursprüngliche Nachricht- Von: Openstack [mailto:openstack-bounces+klaus.schuermann=mediabeam@lists.launchpad.net] Im Auftrag von Robert van Leeuwen Gesendet: Dienstag, 9. Juli 2013 09:09 An: openstack@lists.launchpad.net Betreff: Re: [Openstack] [SWIFT] raising network traffic on the storage node If the replication traffic is responsible for this raising network traffic for only 1.200.000 objects, how much traffic I can expect if I have 100.000.000 objects stored? The average size of my mailobjects are 120 kB. It's planned to use all 12 hard drive slots of my DELL R720xd with 4 TB drives . I have 5 storage nodes and 2 balanced proxy nodes. Will the replication traffic kill my system? We are running with 400.000.000 objects across 11 object storage nodes. Total network traffic on any of those nodes is less then 10 MByte /second However we have seen slowdowns with lots of small files and really big disks. The issue is not related to the network but the local filesystem/disk. When the inode cache gets insufficient you can see terrible slow-downs. There have been a few threads about that in this list, having a lot of memory usually helps a bit. Cheers, Robert van Leeuwen ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [SWIFT] raising network traffic o the storage node
On Monday some more mailboxes store their Mails in the objectstorage. But that only increased the raising. Traffic Storagenode: http://www.schuermann.net/temp/storagenode2.png Traffic Proxyserver: http://www.schuermann.net/temp/proxyserver2.png -Ursprüngliche Nachricht- Von: Peter Portante [mailto:peter.a.porta...@gmail.com] Gesendet: Montag, 8. Juli 2013 16:04 An: Klaus Schürmann Cc: openstack@lists.launchpad.net Betreff: Re: [Openstack] [SWIFT] raising network traffic o the storage node Can you zoom in past the spike, most recent 2 or three weeks and see how it looks? My guess is that the proxy traffic is also rising. On Mon, Jul 8, 2013 at 9:50 AM, Klaus Schürmann klaus.schuerm...@mediabeam.com wrote: Hi, I use a swift storage as a mail-store. Now I have about 1.000.000 objects stored in the cluster. I'm wondering about the raising network traffic on my storage nodes. The traffic from the proxy-server has a normal characteristic. Traffic Storagenode: http://www.schuermann.net/temp/storagenode.png Traffic Proxyserver: http://www.schuermann.net/temp/proxyserver.png Can someone explain such behavior? Thanks Klaus ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [SWIFT] raising network traffic on the storage node
If the replication traffic is responsible for this raising network traffic for only 1.200.000 objects, how much traffic I can expect if I have 100.000.000 objects stored? The average size of my mailobjects are 120 kB. It's planned to use all 12 hard drive slots of my DELL R720xd with 4 TB drives. I have 5 storage nodes and 2 balanced proxy nodes. Will the replication traffic kill my system? Here is a small part of my object-server-replicator log: Jul 9 06:48:16 storage-node1 object-replicator Starting object replication pass. Jul 9 06:49:13 storage-node1 object-replicator 9830/9830 (100.00%) partitions replicated in 57.56s (170.78/sec, 0s remaining) Jul 9 06:49:13 storage-node1 object-replicator 1234597 suffixes checked - 0.00% hashed, 0.00% synced Jul 9 06:49:13 storage-node1 object-replicator Partition times: max 0.0279s, min 0.0068s, med 0.0104s Jul 9 06:49:13 storage-node1 object-replicator Object replication complete. (0.96 minutes) Jul 9 06:49:43 storage-node1 object-replicator Starting object replication pass. Jul 9 06:50:41 storage-node1 object-replicator 9830/9830 (100.00%) partitions replicated in 57.69s (170.39/sec, 0s remaining) Jul 9 06:50:41 storage-node1 object-replicator 1234643 suffixes checked - 0.00% hashed, 0.00% synced Jul 9 06:50:41 storage-node1 object-replicator Partition times: max 0.0365s, min 0.0068s, med 0.0104s Jul 9 06:50:41 storage-node1 object-replicator Object replication complete. (0.96 minutes) Jul 9 06:51:11 storage-node1 object-replicator Starting object replication pass. Jul 9 06:52:09 storage-node1 object-replicator 9830/9830 (100.00%) partitions replicated in 58.31s (168.58/sec, 0s remaining) Jul 9 06:52:09 storage-node1 object-replicator 1234688 suffixes checked - 0.00% hashed, 0.00% synced Jul 9 06:52:09 storage-node1 object-replicator Partition times: max 0.0348s, min 0.0069s, med 0.0106s Jul 9 06:52:09 storage-node1 object-replicator Object replication complete. (0.97 minutes) -Ursprüngliche Nachricht- Von: Pete Zaitcev [mailto:zait...@redhat.com] Gesendet: Montag, 8. Juli 2013 19:22 An: Klaus Schürmann Cc: openstack@lists.launchpad.net Betreff: Re: [Openstack] [SWIFT] raising network traffic o the storage node On Mon, 8 Jul 2013 13:50:38 + Klaus Schürmann klaus.schuerm...@mediabeam.com wrote: I use a swift storage as a mail-store. Now I have about 1.000.000 objects stored in the cluster. Traffic Storagenode: http://www.schuermann.net/temp/storagenode.png Traffic Proxyserver: http://www.schuermann.net/temp/proxyserver.png Can someone explain such behavior? At a guess, raising number of objects makes number of partitions to increase, and that increases the replication traffic, specifically the number of sent MD5s for partitions. It would be interesting to corellate the number of objects and number of non-empty and empty partitions with the amounts of traffic. If the increasing transfer of hashes is the reason, you could also verify by graphing the traffic to port 873 separately. Swift never replicates object bodies through its own HTTP interface, so this splits control traffic from data traffic for you. Data traffic should be driven by customer and node failures, not consistency checking. Be prepared to split up storage nodes, however. Even if there's no bug with replication, its aggregate traffic increases with the increases in object counts. -- Pete ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Swift performance issues with requests
Hi Rick, I found the problem. I placed a hardware balancer in front of the proxy server. The balancer lost some packets because of a faulty network interface. Your tip was excellent. Thanks Klaus -Ursprüngliche Nachricht- Von: Rick Jones [mailto:rick.jon...@hp.com] Gesendet: Freitag, 31. Mai 2013 19:17 An: Klaus Schürmann Cc: openstack@lists.launchpad.net Betreff: Re: [Openstack] Swift performance issues with requests On 05/31/2013 04:55 AM, Klaus Schürmann wrote: May 31 10:33:08 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 31/May/2013/08/33/08 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 - 283354 - txd4a3a4bf3f384936a0bc14dbffddd275 - 0.1020 - May 31 10:33:26 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 31/May/2013/08/33/26 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 - 283354 - txd8c6b34b8e41460bb2c5f3f4b6def0ef - 17.7330 - Something I forgot to mention, which was the basis for my TCP retransmissions guess. Depending on your kernel revision, the initial TCP retransmission timeout is 3 seconds, and it will double each time - eg 3, 6, 12. As it happens, the cumulative time for that is 17 seconds... So, the 17 seconds and change would be consistent with a transient problem in establishing a TCP connection. Of course, it could just be a coincidence. Later kernels - I forget where in the 3.X stream exactly - have the initial retransmission timeout of 1 second. In that case the timeouts would go 1, 2, 4, 8, etc... rick ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] Swift performance issues with requests
Hi, when I test my new swift cluster I get a strange behavior with GET and PUT requests. Most time it is really fast. But sometimes it takes a long time to get the data. Here is an example with the same request which took one time 17 seconds: May 31 10:33:08 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 31/May/2013/08/33/08 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 - 283354 - tx2804381fef91455dabf6c9fd0edf4206 - 0.0546 - May 31 10:33:08 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 31/May/2013/08/33/08 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 - 283354 - tx90025e3259d74b9faa8f17efaf85b104 - 0.0516 - May 31 10:33:08 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 31/May/2013/08/33/08 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 - 283354 - tx942d79f78ee345138df6cd87bac0f860 - 0.0942 - May 31 10:33:08 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 31/May/2013/08/33/08 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 - 283354 - tx73f053e15ed345caad38a6191fe7f196 - 0.0584 - May 31 10:33:08 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 31/May/2013/08/33/08 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 - 283354 - txd4a3a4bf3f384936a0bc14dbffddd275 - 0.1020 - May 31 10:33:26 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 31/May/2013/08/33/26 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 - 283354 - txd8c6b34b8e41460bb2c5f3f4b6def0ef - 17.7330 - May 31 10:33:26 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 31/May/2013/08/33/26 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 - 283354 - tx21aaa822f8294d9592fe04b3de27c98e - 0.0226 - May 31 10:33:26 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 31/May/2013/08/33/26 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 - 283354 - txcabe6adf73f740efb2b82d479a1e6b20 - 0.0385 - May 31 10:33:26 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 31/May/2013/08/33/26 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 - 283354 - txc1247a1bb6c04bd3b496b3b986373170 - 0.0247 - May 31 10:33:26 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 31/May/2013/08/33/26 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 - 283354 - txdf295a88e513443393992f37785f8aed - 0.0144 - May 31 10:33:26 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 31/May/2013/08/33/26 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 - 283354 - tx62bb33e8c20d43b7a4c3512232de6fe4 - 0.0125 - Alle requests on the storage nodes are below 0.01 sec. The tested cluster contain one proxy (DELL R420, 16 G RAM, 2 CPU) and 5 storage-nodes (DELL R720xd, 16 G RAM 2 CPU, 2 HDD). The proxy-server configuration: [DEFAULT] log_name = proxy-server log_facility = LOG_LOCAL1 log_level = INFO log_address = /dev/log bind_port = 80 user = swift workers = 32 log_statsd_host = 10.4.100.10 log_statsd_port = 8125 log_statsd_default_sample_rate = 1 log_statsd_metric_prefix = Proxy01 #set log_level = DEBUG [pipeline:main] pipeline = healthcheck cache proxy-logging tempauth proxy-server [app:proxy-server] use = egg:swift#proxy allow_account_management = true account_autocreate = true [filter:tempauth] use = egg:swift#tempauth user_provider1_ = .xxx http://10.4.100.1/v1/AUTH_provider1 log_name = tempauth log_facility = LOG_LOCAL2 log_level = INFO log_address = /dev/log [filter:cache] use = egg:swift#memcache memcache_servers = 10.12.0.2:11211,10.12.0.3:11211 set log_name = cache [filter:catch_errors] use = egg:swift#catch_errors [filter:healthcheck] use = egg:swift#healthcheck [filter:proxy-logging] use = egg:swift#proxy_logging access_log_name = proxy-logging access_log_facility = LOG_LOCAL3 access_log_level = DEBUG access_log_address = /dev/log Can someone explain such behavior? Thanks Klaus ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp