Also check the number of inodes used: `df -i` --John
On Oct 29, 2012, at 8:31 AM, Nathan Trueblood <nat...@truebloodllc.com> wrote: > Yeah, I read about the 507 error. However, when the error occurs on my I > can see with 'df' that the drive is only 1% full and is definitely not > unmounted. I can write files to the mounted filesystem directly before, > during, and after the Swift error occurs. So the problem must be some kind > of timeout that is causing the object server to think that something is wrong > with the disk. > > I'll keep digging... > > On Fri, Oct 26, 2012 at 11:21 PM, John Dickinson <m...@not.mn> wrote: > A 507 is returned by the object servers in 2 situations: 1) the drives are > full or 2) the drives have been unmounted because of disk error. > > It's highly likely that you simply have full drives. Remember that the usable > space in your cluster is 1/N where N = replica count. As an example, with 3 > replicas and 5 nodes with a single 1TB drive each, you only have about 1.6TB > available for data. > > As Pete suggested in his response, how big are your drives, and what does > `df` tell you? > > --John > > > On Oct 26, 2012, at 5:26 PM, Nathan Trueblood <nat...@truebloodllc.com> wrote: > > > Hey folks- > > > > I'm trying to figure out what's going wrong with my Swift deployment on a > > small cluster of "mini" servers. I have a small test cluster (5 storage > > nodes, 1 proxy) of mini-servers that are ARM-based. The proxy is a > > regular, Intel-based server with plenty of RAM. The > > object/account/container servers are relatively small, with 2GB of RAM per > > node. > > > > Everything starts up fine, but now I'm trying to troubleshoot a strange > > problem. After I successfully upload a few test files, it seems like the > > storage system stops responding and the proxy gives me a 503 error. > > > > Here's the test sequence I run on my proxy: > > > > lab@proxy01:~/bin$ ./swiftcl.sh stat > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat > > Account: AUTH_system > > Containers: 5 > > Objects: 4 > > Bytes: 47804968 > > Accept-Ranges: bytes > > X-Timestamp: 1351294912.72119 > > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1 > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > > myfiles1 /home/lab/bigfile1 > > home/lab/bigfile1 > > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles2 /home/lab/bigfile1 > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > > myfiles2 /home/lab/bigfile1 > > home/lab/bigfile1 > > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles3 /home/lab/bigfile1 > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > > myfiles3 /home/lab/bigfile1 > > home/lab/bigfile1 > > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles4 /home/lab/bigfile1 > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > > myfiles4 /home/lab/bigfile1 > > home/lab/bigfile1 > > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles5 /home/lab/bigfile1 > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > > myfiles5 /home/lab/bigfile1 > > Object PUT failed: > > http://172.16.1.111:8080/v1/AUTH_system/myfiles5/home/lab/bigfile1 503 > > Service Unavailable [first 60 chars of response] 503 Service Unavailable > > > > The server is currently unavailable > > lab@proxy01:~/bin$ ./swiftcl.sh stat > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat > > Account: AUTH_system > > Containers: 6 > > Objects: 5 > > Bytes: 59756210 > > Accept-Ranges: bytes > > X-Timestamp: 1351294912.72119 > > > > Here's the corresponding log on the Proxy: > > > > Oct 26 17:06:52 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/06/52 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 > > Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0017 > > Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 > > Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 > > Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 > > Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 > > Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 > > Oct 26 17:07:27 proxy01 proxy-server Handoff requested (1) (txn: > > tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111) > > Oct 26 17:07:27 proxy01 proxy-server Handoff requested (2) (txn: > > tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111) > > Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 > > Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 > > Oct 26 17:07:33 proxy01 proxy-server Handoff requested (1) (txn: > > tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111) > > Oct 26 17:07:33 proxy01 proxy-server Handoff requested (2) (txn: > > tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111) > > Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0009 > > Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0009 > > Oct 26 17:07:39 proxy01 proxy-server Handoff requested (1) (txn: > > tx8dc917a4a8c84c40a4429b7bab0323c6) (client_ip: 172.16.1.111) > > Oct 26 17:07:39 proxy01 proxy-server Handoff requested (2) (txn: > > tx8dc917a4a8c84c40a4429b7bab0323c6) (client_ip: 172.16.1.111) > > Oct 26 17:07:40 proxy01 proxy-server Object PUT returning 503, 1/2 required > > connections (txn: tx8dc917a4a8c84c40a4429b7bab0323c6) (client_ip: > > 172.16.1.111) > > Oct 26 17:07:41 proxy01 proxy-server Object PUT returning 503, 1/2 required > > connections (txn: tx07a1f5dfaa23445a88eaa4a2ade68466) (client_ip: > > 172.16.1.111) > > Oct 26 17:07:43 proxy01 proxy-server Object PUT returning 503, 1/2 required > > connections (txn: tx938d08b706844db3886695b798bd9fad) (client_ip: > > 172.16.1.111) > > Oct 26 17:07:47 proxy01 proxy-server Object PUT returning 503, 1/2 required > > connections (txn: txa35e9f8a54924f139e13d6f3a5dc457f) (client_ip: > > 172.16.1.111) > > Oct 26 17:07:55 proxy01 proxy-server Object PUT returning 503, 1/2 required > > connections (txn: txc2f5c35b54a14b32a2d0178423bc7771) (client_ip: > > 172.16.1.111) > > Oct 26 17:08:11 proxy01 proxy-server Object PUT returning 503, 1/2 required > > connections (txn: txd8df89e76edd40f3b4c04091b3cbc857) (client_ip: > > 172.16.1.111) > > Oct 26 17:08:17 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/08/17 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0011 > > > > And here's the log from one of the Storage Nodes: > > > > Oct 26 17:07:20 data05 container-server 192.168.1.111 - - > > [27/Oct/2012:00:07:20 +0000] "PUT /sda6/200589/AUTH_system/myfiles1" 201 - > > "tx063cd0643ef64d169b3bc5a86a808df4" "-" "-" 0.0510 > > Oct 26 17:07:20 data05 account-server 192.168.1.204 - - > > [27/Oct/2012:00:07:20 +0000] "PUT /sda6/116021/AUTH_system/myfiles1" 201 - > > "tx063cd0643ef64d169b3bc5a86a808df4" "-" "-" 0.0154 "" > > Oct 26 17:07:20 data05 object-server 192.168.1.111 - - > > [27/Oct/2012:00:07:20 +0000] "HEAD > > /sda6/36256/AUTH_system/myfiles1/home/lab/bigfile1" 404 - "-" > > "tx6f1018e5da4d4974a85d544a0874a581" "-" 0.0015 > > Oct 26 17:07:21 data05 object-server 192.168.1.111 - - > > [27/Oct/2012:00:07:21 +0000] "PUT > > /sda6/36256/AUTH_system/myfiles1/home/lab/bigfile1" 201 - "-" > > "tx59f931d3281d41159eba0e0c88c18107" "-" 1.0255 > > Oct 26 17:07:21 data05 container-server 192.168.1.202 - - > > [27/Oct/2012:00:07:21 +0000] "PUT > > /sda6/200589/AUTH_system/myfiles1/home/lab/bigfile1" 201 - > > "tx59f931d3281d41159eba0e0c88c18107" "-" "-" 0.0218 > > Oct 26 17:07:24 data05 account-replicator Beginning replication run > > Oct 26 17:07:25 data05 account-replicator Replication run OVER > > Oct 26 17:07:25 data05 account-replicator Attempted to replicate 1 dbs in > > 0.21688 seconds (4.61073/s) > > Oct 26 17:07:25 data05 account-replicator Removed 0 dbs > > Oct 26 17:07:25 data05 account-replicator 2 successes, 0 failures > > Oct 26 17:07:25 data05 account-replicator no_change:0 ts_repl:0 diff:2 > > rsync:0 diff_capped:0 hashmatch:0 empty:0 > > Oct 26 17:07:28 data05 account-server 192.168.1.204 - - > > [27/Oct/2012:00:07:28 +0000] "PUT /sda6/116021/AUTH_system/myfiles1" 201 - > > "-" "-" "-" 0.0021 "" > > Oct 26 17:07:29 data05 account-server 192.168.1.205 - - > > [27/Oct/2012:00:07:29 +0000] "PUT /sda6/116021/AUTH_system/myfiles2" 201 - > > "txbace9f92a41a4645beb9159a802fc12d" "-" "-" 0.0178 "" > > Oct 26 17:07:30 data05 object-server 192.168.1.111 - - > > [27/Oct/2012:00:07:30 +0000] "PUT > > /sda6/63086/AUTH_system/myfiles2/home/lab/bigfile1" 201 - "-" > > "txa4fac7ceab2948ae9010b4ce4cb56b97" "-" 0.7135 > > Oct 26 17:07:33 data05 object-replicator Starting object replication pass. > > Oct 26 17:07:33 data05 object-replicator 4/4 (100.00%) partitions > > replicated in 0.17s (23.71/sec, 0s remaining) > > Oct 26 17:07:33 data05 object-replicator 4 suffixes checked - 50.00% > > hashed, 0.00% synced > > Oct 26 17:07:33 data05 object-replicator Partition times: max 0.1012s, min > > 0.0164s, med 0.0273s > > Oct 26 17:07:33 data05 object-replicator Object replication complete. (0.00 > > minutes) > > Oct 26 17:07:34 data05 account-server 192.168.1.206 - - > > [27/Oct/2012:00:07:34 +0000] "PUT /sda6/116021/AUTH_system/myfiles3" 201 - > > "txa547450c156d4df9ad7e42c04dcabd8f" "-" "-" 0.0168 "" > > Oct 26 17:07:34 data05 container-server 192.168.1.111 - - > > [27/Oct/2012:00:07:34 +0000] "PUT /sda6/118594/AUTH_system/myfiles3" 202 - > > "txa547450c156d4df9ad7e42c04dcabd8f" "-" "-" 0.0399 > > Oct 26 17:07:35 data05 container-server 192.168.1.206 - - > > [27/Oct/2012:00:07:35 +0000] "PUT > > /sda6/118594/AUTH_system/myfiles3/home/lab/bigfile1" 201 - > > "tx6946419daba54efe9c2878f8a2a78f88" "-" "-" 0.0022 > > Oct 26 17:07:35 data05 object-server 192.168.1.111 - - > > [27/Oct/2012:00:07:35 +0000] "PUT > > /sda6/35436/AUTH_system/myfiles3/home/lab/bigfile1" 201 - "-" > > "tx6946419daba54efe9c2878f8a2a78f88" "-" 0.6806 > > Oct 26 17:07:36 data05 account-server 192.168.1.205 - - > > [27/Oct/2012:00:07:36 +0000] "PUT /sda6/116021/AUTH_system/myfiles2" 201 - > > "-" "-" "-" 0.0021 "" > > Oct 26 17:07:37 data05 object-auditor Begin object audit "forever" mode > > (ZBF) > > Oct 26 17:07:37 data05 object-auditor Object audit (ZBF) "forever" mode > > completed: 0.04s. Total quarantined: 0, Total errors: 0, Total files/sec: > > 89.46 , Total bytes/sec: 0.00, Auditing time: 0.04, Rate: 0.95 > > Oct 26 17:07:40 data05 account-server 192.168.1.204 - - > > [27/Oct/2012:00:07:40 +0000] "PUT /sda6/116021/AUTH_system/myfiles4" 201 - > > "txe095c94f8e5c4c4681a8d6a9c330bb7b" "-" "-" 0.0200 "" > > Oct 26 17:07:40 data05 container-server 192.168.1.111 - - > > [27/Oct/2012:00:07:40 +0000] "PUT /sda6/217542/AUTH_system/myfiles4" 202 - > > "txe095c94f8e5c4c4681a8d6a9c330bb7b" "-" "-" 0.0671 > > Oct 26 17:07:40 data05 container-server 192.168.1.206 - - > > [27/Oct/2012:00:07:40 +0000] "PUT > > /sda6/217542/AUTH_system/myfiles4/home/lab/bigfile1" 201 - > > "tx5f9659f74cb2491f9a63cbb84f680c5c" "-" "-" 0.0023 > > Oct 26 17:07:40 data05 object-server 192.168.1.111 - - > > [27/Oct/2012:00:07:40 +0000] "PUT > > /sda6/49697/AUTH_system/myfiles4/home/lab/bigfile1" 201 - "-" > > "tx5f9659f74cb2491f9a63cbb84f680c5c" "-" 0.6038 > > Oct 26 17:07:44 data05 container-updater Begin container update sweep > > Oct 26 17:07:44 data05 container-replicator Beginning replication run > > Oct 26 17:07:44 data05 account-server 192.168.1.206 - - > > [27/Oct/2012:00:07:44 +0000] "PUT /sda6/116021/AUTH_system/myfiles1" 201 - > > "-" "-" "-" 0.0021 "" > > Oct 26 17:07:44 data05 account-server 192.168.1.206 - - > > [27/Oct/2012:00:07:44 +0000] "PUT /sda6/116021/AUTH_system/myfiles3" 201 - > > "-" "-" "-" 0.0021 "" > > Oct 26 17:07:44 data05 account-server 192.168.1.206 - - > > [27/Oct/2012:00:07:44 +0000] "PUT /sda6/116021/AUTH_system/myfiles4" 201 - > > "-" "-" "-" 0.0021 "" > > Oct 26 17:07:45 data05 account-server 192.168.1.203 - - > > [27/Oct/2012:00:07:45 +0000] "PUT /sda6/116021/AUTH_system/myfiles2" 201 - > > "-" "-" "-" 0.0023 "" > > Oct 26 17:07:45 data05 account-server 192.168.1.203 - - > > [27/Oct/2012:00:07:45 +0000] "PUT /sda6/116021/AUTH_system/myfiles1" 201 - > > "-" "-" "-" 0.0020 "" > > Oct 26 17:07:45 data05 container-replicator Replication run OVER > > Oct 26 17:07:45 data05 account-server 192.168.1.203 - - > > [27/Oct/2012:00:07:45 +0000] "PUT /sda6/116021/AUTH_system/myfiles3" 201 - > > "-" "-" "-" 0.0223 "" > > Oct 26 17:07:45 data05 container-replicator Attempted to replicate 5 dbs in > > 0.83001 seconds (6.02399/s) > > Oct 26 17:07:45 data05 container-replicator Removed 0 dbs > > Oct 26 17:07:45 data05 container-replicator 10 successes, 0 failures > > Oct 26 17:07:45 data05 container-replicator no_change:8 ts_repl:0 diff:2 > > rsync:0 diff_capped:0 hashmatch:0 empty:0 > > Oct 26 17:07:45 data05 container-updater Container update sweep completed: > > 0.98s > > Oct 26 17:07:46 data05 account-server 192.168.1.205 - - > > [27/Oct/2012:00:07:46 +0000] "PUT /sda6/116021/AUTH_system/myfiles5" 201 - > > "tx7ba0264da03c485ca3c3b4048f6a3d8c" "-" "-" 0.0098 "" > > Oct 26 17:07:46 data05 container-server 192.168.1.111 - - > > [27/Oct/2012:00:07:46 +0000] "PUT /sda6/122255/AUTH_system/myfiles5" 202 - > > "tx7ba0264da03c485ca3c3b4048f6a3d8c" "-" "-" 0.0465 > > Oct 26 17:07:46 data05 object-server 192.168.1.111 - - > > [27/Oct/2012:00:07:46 +0000] "HEAD > > /sda6/150861/AUTH_system/myfiles5/home/lab/bigfile1" 404 - "-" > > "txf08e8b91b2894a9da099f26b5300327b" "-" 0.0011 > > Oct 26 17:07:46 data05 object-server 192.168.1.111 - - > > [27/Oct/2012:00:07:46 +0000] "PUT > > /sda6/150861/AUTH_system/myfiles5/home/lab/bigfile1" 507 - "-" > > "tx8dc917a4a8c84c40a4429b7bab0323c6" "-" 0.0031 > > > > Right now, there is nothing special in the object-server.conf: > > > > [DEFAULT] > > bind_ip = 192.168.1.206 > > workers = 1 > > > > [pipeline:main] > > pipeline = object-server > > > > [app:object-server] > > use = egg:swift#object > > > > [object-replicator] > > > > [object-updater] > > > > [object-auditor] > > > > [object-server] > > set log_level = DEBUG > > > > My general question is: how do I troubleshoot this? I've tried turning on > > debug logging on the Object Server, but I don't get any additional clues. > > > > The Object-servers do give a 507 error, which might indicate a disk > > problem, but there is nothing wrong with the storage drive. And also if > > there was a fundamental drive problem then I wouldn't be able to upload > > objects in the first place. > > > > My best guess at the moment is that I should adjust the configuration of > > the object servers to reflect their lower performance. What would I > > adjust, the cache size? I'm not seeing any log messages that would > > indicate a time-out. > > > > Thanks in advance for your advice. > > > > -N > > _______________________________________________ > > Mailing list: https://launchpad.net/~openstack > > Post to : openstack@lists.launchpad.net > > Unsubscribe : https://launchpad.net/~openstack > > More help : https://help.launchpad.net/ListHelp > >
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp