Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers
Ok, if I was giving out t-shirts for finding this issue then the prize would go to Pete. Thank you Disabling fallocate did the trick. I was slowly working my way through all the object-server config options and hadn't gotten to that one yet. Turning features on and off by brute force is admittedly lame, but sometimes that's all you have. I also turned off all the other things I was doing to try to slow down the mini-servers, but disabling fallocate was all that was necessary. Here is my config: [DEFAULT] bind_ip = 192.168.1.202 workers = 1 disable_fallocate = true [pipeline:main] pipeline = object-server [app:object-server] use = egg:swift#object [object-replicator] [object-updater] [object-auditor] A few more details... My servers are running Ubuntu 12.04 LTS. A straight-up apt-get of all the pre-requisites did NOT produce a working Swift deployment on Arm. Although all the dependencies would deploy fine and the Swift services would start up, the proxy-server could not communicate with the storage nodes. So I also had to get older, Armel versions of the python-greenlet and python-eventlet. https://launchpad.net/ubuntu/precise/armel/python-greenlet/0.3.1-1ubuntu5.1 https://launchpad.net/ubuntu/precise/armel/python-eventlet/0.9.16-1ubuntu4.1 Once I deployed those older libraries for Armel, then my Swift cluster worked (except for the fallocate issue). Thanks for everyone's help. -N On Tue, Oct 30, 2012 at 11:07 AM, Nathan Trueblood wrote: > The filesystem is XFS, and I used the recommended mkfs and mount options > for Swift. > > The file size seems to have no bearing on the issue, although I haven't > tried really tiny files. Bigfile3 is only 200K. > > I'll try disabling fallocate... > > > On Mon, Oct 29, 2012 at 7:37 PM, Pete Zaitcev wrote: > >> On Mon, 29 Oct 2012 18:16:52 -0700 >> Nathan Trueblood wrote: >> >> > Definitely NOT a problem with the filesystem, but something is causing >> the >> > object-server to think there is a problem with the filesystem. >> >> If you are willing to go all-out, you can probably catch the >> error with strace, if it works on ARM. Failing that, find all places >> where 507 is generated and see if any exceptions are caught, by >> modifying the source, I'm afraid to say. >> >> > I suspect a bug in one of the underlying libraries. >> >> That's a possibility. Or, it could be a kernel bug. You are using XFS, >> right? If it were something other than XFS or ext4, I would suspect >> ARM blowing over the 2GB barrier somewhere, since your object is >> called "bigfile3". As it is, you have little option than to divide >> the layers until you identify the one that's broken. >> >> BTW, make sure to disable the fallocate, since we're at it. >> >> -- Pete >> > > ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers
On Tue, 30 Oct 2012 11:07:55 -0700 Nathan Trueblood wrote: > The file size seems to have no bearing on the issue, although I haven't > tried really tiny files. Bigfile3 is only 200K. Okay. BTW, do not forget to use curl and issue the same PUT that proxy does, see if it throws 507 repeateably. That could shortcut some of the testing. -- Pete ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers
The filesystem is XFS, and I used the recommended mkfs and mount options for Swift. The file size seems to have no bearing on the issue, although I haven't tried really tiny files. Bigfile3 is only 200K. I'll try disabling fallocate... On Mon, Oct 29, 2012 at 7:37 PM, Pete Zaitcev wrote: > On Mon, 29 Oct 2012 18:16:52 -0700 > Nathan Trueblood wrote: > > > Definitely NOT a problem with the filesystem, but something is causing > the > > object-server to think there is a problem with the filesystem. > > If you are willing to go all-out, you can probably catch the > error with strace, if it works on ARM. Failing that, find all places > where 507 is generated and see if any exceptions are caught, by > modifying the source, I'm afraid to say. > > > I suspect a bug in one of the underlying libraries. > > That's a possibility. Or, it could be a kernel bug. You are using XFS, > right? If it were something other than XFS or ext4, I would suspect > ARM blowing over the 2GB barrier somewhere, since your object is > called "bigfile3". As it is, you have little option than to divide > the layers until you identify the one that's broken. > > BTW, make sure to disable the fallocate, since we're at it. > > -- Pete > ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers
No disk errors in the kern.log. The filesystem is fine. I really think this will turn out to be a bug or a timing (slowness) issue. I will try some of the other recent suggestions, and failing those try to track this down with strace. Thx. On Mon, Oct 29, 2012 at 7:02 PM, Alex Yang wrote: > There are any error about disk in the kern.log? > > > 2012/10/30 Nathan Trueblood > >> Still no further clues. I re-created all the volumes I'm using for >> Swift. Plenty of Inodes free: >> >> lab@data02:~$ df -i >> FilesystemInodes IUsed IFree IUse% Mounted on >> /dev/sda2 12214272 39290 121749821% / >> none 107979 4821074971% /dev >> none 107979 2681077111% /run >> none 107979 21079771% /run/lock >> none 107979 11079781% /run/shm >> /dev/sda1 4915223 491291% /boot >> /dev/sda4 13404640037 1340463631% /srv/node/sda4 >> >> I successfully upload a small object to container cont1, then cont2. >> When I upload to cont3, I see the following in the object-server log >> (data02) >> >> This seems to be the problematic sequence: >> >> Data02 has ip 192.168.1.202 >> Data03 has ip 192.168.1.203 >> >> 1. First the account server reports an HTTP 201 on the container from a >> different object server in a different zone. >> 2. Then the object server reports a 404 trying to HEAD the new object. >> 3. Then the object server reports a 507 trying to PUT the new object. >> >> From this point the operation eventually fails and the proxy reports a >> 503. >> >> Oct 29 17:58:20 data02 account-server 192.168.1.203 - - >> [30/Oct/2012:00:58:20 +] "PUT /sda4/116021/AUTH_system/cont3" 201 - >> "tx5a3ca6c845af41928e0ba6b7bc58d2da" "-" "-" 0.0082 "" >> Oct 29 17:58:20 data02 object-server 192.168.1.111 - - >> [30/Oct/2012:00:58:20 +] "HEAD >> /sda4/257613/AUTH_system/cont3/home/lab/bigfile3" 404 - "-" >> "tx5f21503ff12e45e39a80eb52f6757261" "-" 0.0011 >> Oct 29 17:58:20 data02 object-server 192.168.1.111 - - >> [30/Oct/2012:00:58:20 +] "PUT >> /sda4/257613/AUTH_system/cont3/home/lab/bigfile3" 507 - "-" >> "tx425494dc372740e28d043a07d3a08b9a" "-" 0.0031 >> >> In an earlier, successful transaction I noticed that between Steps 1 and >> 2 above, there is a response from the container-server: >> >> Oct 29 17:57:59 data02 account-server 192.168.1.204 - - >> [30/Oct/2012:00:57:59 +] "PUT /sda4/116021/AUTH_system/cont2" 201 - >> "txb10d75886bf14e4eba14fcc52d81c5d9" "-" "-" 0.0182 "" >> Oct 29 17:57:59 data02 container-server 192.168.1.111 - - >> [30/Oct/2012:00:57:59 +] "PUT /sda4/122355/AUTH_system/cont2" 201 - >> "txb10d75886bf14e4eba14fcc52d81c5d9" "-" "-" 0.1554 >> Oct 29 17:57:59 data02 object-server 192.168.1.111 - - >> [30/Oct/2012:00:57:59 +] "HEAD >> /sda4/226151/AUTH_system/cont2/home/lab/bigfile3" 404 - "-" >> "tx1c514850530849d1bfbfa716d9039b87" "-" 0.0012 >> Oct 29 17:57:59 data02 container-server 192.168.1.204 - - >> [30/Oct/2012:00:57:59 +] "PUT >> /sda4/122355/AUTH_system/cont2/home/lab/bigfile3" 201 - >> "tx8130af5cae484e5f9c5a25541d1c87aa" "-" "-" 0.0041 >> Oct 29 17:57:59 data02 object-server 192.168.1.111 - - >> [30/Oct/2012:00:57:59 +] "PUT >> /sda4/226151/AUTH_system/cont2/home/lab/bigfile3" 201 - "-" >> "tx8130af5cae484e5f9c5a25541d1c87aa" "-" 0.1716 >> >> >> So maybe the container server is failing to create the new container? >> Maybe a bug in auto-create of containers? >> >> Definitely NOT a problem with the filesystem, but something is causing >> the object-server to think there is a problem with the filesystem. >> >> I suspect a bug in one of the underlying libraries. >> >> Any further suggestions on how to troubleshoot? >> >> Thanks. When I finally find the solution, I'll post my results. >> >> -N >> >> On Fri, Oct 26, 2012 at 11:21 PM, John Dickinson wrote: >> >>> A 507 is returned by the object servers in 2 situations: 1) the drives >>> are full or 2) the drives have been unmounted because of disk error. >>> >>> It's highly likely that you simply have full drives. Remember that the >>> usable space in your cluster is 1/N where N = replica count. As an example, >>> with 3 replicas and 5 nodes with a single 1TB drive each, you only have >>> about 1.6TB available for data. >>> >>> As Pete suggested in his response, how big are your drives, and what >>> does `df` tell you? >>> >>> --John >>> >>> >>> On Oct 26, 2012, at 5:26 PM, Nathan Trueblood >>> wrote: >>> >>> > Hey folks- >>> > >>> > I'm trying to figure out what's going wrong with my Swift deployment >>> on a small cluster of "mini" servers. I have a small test cluster (5 >>> storage nodes, 1 proxy) of mini-servers that are ARM-based. The proxy is >>> a regular, Intel-based server with plenty of RAM. The >>> object/account/container servers are relatively small, with 2GB of RAM per >>> node. >>> > >>> > Everything starts up fine, but now I'm trying to troubleshoot a >
Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers
On 10/29/2012 07:37 PM, Pete Zaitcev wrote: On Mon, 29 Oct 2012 18:16:52 -0700 Nathan Trueblood wrote: Definitely NOT a problem with the filesystem, but something is causing the object-server to think there is a problem with the filesystem. If you are willing to go all-out, you can probably catch the error with strace, if it works on ARM. Strace is your friend even if he is sometimes a bit on the chatty side. It looks as though there is at least some support for ARM if http://packages.debian.org/search?keywords=strace is any indication. rick jones ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers
On Mon, 29 Oct 2012 18:16:52 -0700 Nathan Trueblood wrote: > Definitely NOT a problem with the filesystem, but something is causing the > object-server to think there is a problem with the filesystem. If you are willing to go all-out, you can probably catch the error with strace, if it works on ARM. Failing that, find all places where 507 is generated and see if any exceptions are caught, by modifying the source, I'm afraid to say. > I suspect a bug in one of the underlying libraries. That's a possibility. Or, it could be a kernel bug. You are using XFS, right? If it were something other than XFS or ext4, I would suspect ARM blowing over the 2GB barrier somewhere, since your object is called "bigfile3". As it is, you have little option than to divide the layers until you identify the one that's broken. BTW, make sure to disable the fallocate, since we're at it. -- Pete ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers
There are any error about disk in the kern.log? 2012/10/30 Nathan Trueblood > Still no further clues. I re-created all the volumes I'm using for > Swift. Plenty of Inodes free: > > lab@data02:~$ df -i > FilesystemInodes IUsed IFree IUse% Mounted on > /dev/sda2 12214272 39290 121749821% / > none 107979 4821074971% /dev > none 107979 2681077111% /run > none 107979 21079771% /run/lock > none 107979 11079781% /run/shm > /dev/sda1 4915223 491291% /boot > /dev/sda4 13404640037 1340463631% /srv/node/sda4 > > I successfully upload a small object to container cont1, then cont2. > When I upload to cont3, I see the following in the object-server log > (data02) > > This seems to be the problematic sequence: > > Data02 has ip 192.168.1.202 > Data03 has ip 192.168.1.203 > > 1. First the account server reports an HTTP 201 on the container from a > different object server in a different zone. > 2. Then the object server reports a 404 trying to HEAD the new object. > 3. Then the object server reports a 507 trying to PUT the new object. > > From this point the operation eventually fails and the proxy reports a 503. > > Oct 29 17:58:20 data02 account-server 192.168.1.203 - - > [30/Oct/2012:00:58:20 +] "PUT /sda4/116021/AUTH_system/cont3" 201 - > "tx5a3ca6c845af41928e0ba6b7bc58d2da" "-" "-" 0.0082 "" > Oct 29 17:58:20 data02 object-server 192.168.1.111 - - > [30/Oct/2012:00:58:20 +] "HEAD > /sda4/257613/AUTH_system/cont3/home/lab/bigfile3" 404 - "-" > "tx5f21503ff12e45e39a80eb52f6757261" "-" 0.0011 > Oct 29 17:58:20 data02 object-server 192.168.1.111 - - > [30/Oct/2012:00:58:20 +] "PUT > /sda4/257613/AUTH_system/cont3/home/lab/bigfile3" 507 - "-" > "tx425494dc372740e28d043a07d3a08b9a" "-" 0.0031 > > In an earlier, successful transaction I noticed that between Steps 1 and 2 > above, there is a response from the container-server: > > Oct 29 17:57:59 data02 account-server 192.168.1.204 - - > [30/Oct/2012:00:57:59 +] "PUT /sda4/116021/AUTH_system/cont2" 201 - > "txb10d75886bf14e4eba14fcc52d81c5d9" "-" "-" 0.0182 "" > Oct 29 17:57:59 data02 container-server 192.168.1.111 - - > [30/Oct/2012:00:57:59 +] "PUT /sda4/122355/AUTH_system/cont2" 201 - > "txb10d75886bf14e4eba14fcc52d81c5d9" "-" "-" 0.1554 > Oct 29 17:57:59 data02 object-server 192.168.1.111 - - > [30/Oct/2012:00:57:59 +] "HEAD > /sda4/226151/AUTH_system/cont2/home/lab/bigfile3" 404 - "-" > "tx1c514850530849d1bfbfa716d9039b87" "-" 0.0012 > Oct 29 17:57:59 data02 container-server 192.168.1.204 - - > [30/Oct/2012:00:57:59 +] "PUT > /sda4/122355/AUTH_system/cont2/home/lab/bigfile3" 201 - > "tx8130af5cae484e5f9c5a25541d1c87aa" "-" "-" 0.0041 > Oct 29 17:57:59 data02 object-server 192.168.1.111 - - > [30/Oct/2012:00:57:59 +] "PUT > /sda4/226151/AUTH_system/cont2/home/lab/bigfile3" 201 - "-" > "tx8130af5cae484e5f9c5a25541d1c87aa" "-" 0.1716 > > > So maybe the container server is failing to create the new container? > Maybe a bug in auto-create of containers? > > Definitely NOT a problem with the filesystem, but something is causing the > object-server to think there is a problem with the filesystem. > > I suspect a bug in one of the underlying libraries. > > Any further suggestions on how to troubleshoot? > > Thanks. When I finally find the solution, I'll post my results. > > -N > > On Fri, Oct 26, 2012 at 11:21 PM, John Dickinson wrote: > >> A 507 is returned by the object servers in 2 situations: 1) the drives >> are full or 2) the drives have been unmounted because of disk error. >> >> It's highly likely that you simply have full drives. Remember that the >> usable space in your cluster is 1/N where N = replica count. As an example, >> with 3 replicas and 5 nodes with a single 1TB drive each, you only have >> about 1.6TB available for data. >> >> As Pete suggested in his response, how big are your drives, and what does >> `df` tell you? >> >> --John >> >> >> On Oct 26, 2012, at 5:26 PM, Nathan Trueblood >> wrote: >> >> > Hey folks- >> > >> > I'm trying to figure out what's going wrong with my Swift deployment on >> a small cluster of "mini" servers. I have a small test cluster (5 storage >> nodes, 1 proxy) of mini-servers that are ARM-based. The proxy is a >> regular, Intel-based server with plenty of RAM. The >> object/account/container servers are relatively small, with 2GB of RAM per >> node. >> > >> > Everything starts up fine, but now I'm trying to troubleshoot a strange >> problem. After I successfully upload a few test files, it seems like the >> storage system stops responding and the proxy gives me a 503 error. >> > >> > Here's the test sequence I run on my proxy: >> > >> > lab@proxy01:~/bin$ ./swiftcl.sh stat >> > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat >> >Account: AUTH_system >> > Containers: 5 >> >Objects: 4 >> > Bytes: 4
Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers
Still no further clues. I re-created all the volumes I'm using for Swift. Plenty of Inodes free: lab@data02:~$ df -i FilesystemInodes IUsed IFree IUse% Mounted on /dev/sda2 12214272 39290 121749821% / none 107979 4821074971% /dev none 107979 2681077111% /run none 107979 21079771% /run/lock none 107979 11079781% /run/shm /dev/sda1 4915223 491291% /boot /dev/sda4 13404640037 1340463631% /srv/node/sda4 I successfully upload a small object to container cont1, then cont2. When I upload to cont3, I see the following in the object-server log (data02) This seems to be the problematic sequence: Data02 has ip 192.168.1.202 Data03 has ip 192.168.1.203 1. First the account server reports an HTTP 201 on the container from a different object server in a different zone. 2. Then the object server reports a 404 trying to HEAD the new object. 3. Then the object server reports a 507 trying to PUT the new object. >From this point the operation eventually fails and the proxy reports a 503. Oct 29 17:58:20 data02 account-server 192.168.1.203 - - [30/Oct/2012:00:58:20 +] "PUT /sda4/116021/AUTH_system/cont3" 201 - "tx5a3ca6c845af41928e0ba6b7bc58d2da" "-" "-" 0.0082 "" Oct 29 17:58:20 data02 object-server 192.168.1.111 - - [30/Oct/2012:00:58:20 +] "HEAD /sda4/257613/AUTH_system/cont3/home/lab/bigfile3" 404 - "-" "tx5f21503ff12e45e39a80eb52f6757261" "-" 0.0011 Oct 29 17:58:20 data02 object-server 192.168.1.111 - - [30/Oct/2012:00:58:20 +] "PUT /sda4/257613/AUTH_system/cont3/home/lab/bigfile3" 507 - "-" "tx425494dc372740e28d043a07d3a08b9a" "-" 0.0031 In an earlier, successful transaction I noticed that between Steps 1 and 2 above, there is a response from the container-server: Oct 29 17:57:59 data02 account-server 192.168.1.204 - - [30/Oct/2012:00:57:59 +] "PUT /sda4/116021/AUTH_system/cont2" 201 - "txb10d75886bf14e4eba14fcc52d81c5d9" "-" "-" 0.0182 "" Oct 29 17:57:59 data02 container-server 192.168.1.111 - - [30/Oct/2012:00:57:59 +] "PUT /sda4/122355/AUTH_system/cont2" 201 - "txb10d75886bf14e4eba14fcc52d81c5d9" "-" "-" 0.1554 Oct 29 17:57:59 data02 object-server 192.168.1.111 - - [30/Oct/2012:00:57:59 +] "HEAD /sda4/226151/AUTH_system/cont2/home/lab/bigfile3" 404 - "-" "tx1c514850530849d1bfbfa716d9039b87" "-" 0.0012 Oct 29 17:57:59 data02 container-server 192.168.1.204 - - [30/Oct/2012:00:57:59 +] "PUT /sda4/122355/AUTH_system/cont2/home/lab/bigfile3" 201 - "tx8130af5cae484e5f9c5a25541d1c87aa" "-" "-" 0.0041 Oct 29 17:57:59 data02 object-server 192.168.1.111 - - [30/Oct/2012:00:57:59 +] "PUT /sda4/226151/AUTH_system/cont2/home/lab/bigfile3" 201 - "-" "tx8130af5cae484e5f9c5a25541d1c87aa" "-" 0.1716 So maybe the container server is failing to create the new container? Maybe a bug in auto-create of containers? Definitely NOT a problem with the filesystem, but something is causing the object-server to think there is a problem with the filesystem. I suspect a bug in one of the underlying libraries. Any further suggestions on how to troubleshoot? Thanks. When I finally find the solution, I'll post my results. -N On Fri, Oct 26, 2012 at 11:21 PM, John Dickinson wrote: > A 507 is returned by the object servers in 2 situations: 1) the drives are > full or 2) the drives have been unmounted because of disk error. > > It's highly likely that you simply have full drives. Remember that the > usable space in your cluster is 1/N where N = replica count. As an example, > with 3 replicas and 5 nodes with a single 1TB drive each, you only have > about 1.6TB available for data. > > As Pete suggested in his response, how big are your drives, and what does > `df` tell you? > > --John > > > On Oct 26, 2012, at 5:26 PM, Nathan Trueblood > wrote: > > > Hey folks- > > > > I'm trying to figure out what's going wrong with my Swift deployment on > a small cluster of "mini" servers. I have a small test cluster (5 storage > nodes, 1 proxy) of mini-servers that are ARM-based. The proxy is a > regular, Intel-based server with plenty of RAM. The > object/account/container servers are relatively small, with 2GB of RAM per > node. > > > > Everything starts up fine, but now I'm trying to troubleshoot a strange > problem. After I successfully upload a few test files, it seems like the > storage system stops responding and the proxy gives me a 503 error. > > > > Here's the test sequence I run on my proxy: > > > > lab@proxy01:~/bin$ ./swiftcl.sh stat > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat > >Account: AUTH_system > > Containers: 5 > >Objects: 4 > > Bytes: 47804968 > > Accept-Ranges: bytes > > X-Timestamp: 1351294912.72119 > > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1 > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass > upload myfiles1 /home/lab/bigfile1 > > home/la
Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers
Also check the number of inodes used: `df -i` --John On Oct 29, 2012, at 8:31 AM, Nathan Trueblood wrote: > Yeah, I read about the 507 error.However, when the error occurs on my I > can see with 'df' that the drive is only 1% full and is definitely not > unmounted. I can write files to the mounted filesystem directly before, > during, and after the Swift error occurs. So the problem must be some kind > of timeout that is causing the object server to think that something is wrong > with the disk. > > I'll keep digging... > > On Fri, Oct 26, 2012 at 11:21 PM, John Dickinson wrote: > A 507 is returned by the object servers in 2 situations: 1) the drives are > full or 2) the drives have been unmounted because of disk error. > > It's highly likely that you simply have full drives. Remember that the usable > space in your cluster is 1/N where N = replica count. As an example, with 3 > replicas and 5 nodes with a single 1TB drive each, you only have about 1.6TB > available for data. > > As Pete suggested in his response, how big are your drives, and what does > `df` tell you? > > --John > > > On Oct 26, 2012, at 5:26 PM, Nathan Trueblood wrote: > > > Hey folks- > > > > I'm trying to figure out what's going wrong with my Swift deployment on a > > small cluster of "mini" servers. I have a small test cluster (5 storage > > nodes, 1 proxy) of mini-servers that are ARM-based. The proxy is a > > regular, Intel-based server with plenty of RAM. The > > object/account/container servers are relatively small, with 2GB of RAM per > > node. > > > > Everything starts up fine, but now I'm trying to troubleshoot a strange > > problem. After I successfully upload a few test files, it seems like the > > storage system stops responding and the proxy gives me a 503 error. > > > > Here's the test sequence I run on my proxy: > > > > lab@proxy01:~/bin$ ./swiftcl.sh stat > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat > >Account: AUTH_system > > Containers: 5 > >Objects: 4 > > Bytes: 47804968 > > Accept-Ranges: bytes > > X-Timestamp: 1351294912.72119 > > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1 > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > > myfiles1 /home/lab/bigfile1 > > home/lab/bigfile1 > > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles2 /home/lab/bigfile1 > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > > myfiles2 /home/lab/bigfile1 > > home/lab/bigfile1 > > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles3 /home/lab/bigfile1 > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > > myfiles3 /home/lab/bigfile1 > > home/lab/bigfile1 > > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles4 /home/lab/bigfile1 > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > > myfiles4 /home/lab/bigfile1 > > home/lab/bigfile1 > > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles5 /home/lab/bigfile1 > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > > myfiles5 /home/lab/bigfile1 > > Object PUT failed: > > http://172.16.1.111:8080/v1/AUTH_system/myfiles5/home/lab/bigfile1 503 > > Service Unavailable [first 60 chars of response] 503 Service Unavailable > > > > The server is currently unavailable > > lab@proxy01:~/bin$ ./swiftcl.sh stat > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat > >Account: AUTH_system > > Containers: 6 > >Objects: 5 > > Bytes: 59756210 > > Accept-Ranges: bytes > > X-Timestamp: 1351294912.72119 > > > > Here's the corresponding log on the Proxy: > > > > Oct 26 17:06:52 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/06/52 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 > > Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0017 > > Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 > > Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 > > Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 > > Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 > > Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 > > Oct 26 17:07:27 proxy01 proxy-server Handoff requested (1) (txn: > > tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111) > > Oct 26 17:07:27 proxy01 proxy-server Handoff requested (2) (txn: > > tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111) > > Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET > > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers
Yeah, I read about the 507 error.However, when the error occurs on my I can see with 'df' that the drive is only 1% full and is definitely not unmounted. I can write files to the mounted filesystem directly before, during, and after the Swift error occurs. So the problem must be some kind of timeout that is causing the object server to think that something is wrong with the disk. I'll keep digging... On Fri, Oct 26, 2012 at 11:21 PM, John Dickinson wrote: > A 507 is returned by the object servers in 2 situations: 1) the drives are > full or 2) the drives have been unmounted because of disk error. > > It's highly likely that you simply have full drives. Remember that the > usable space in your cluster is 1/N where N = replica count. As an example, > with 3 replicas and 5 nodes with a single 1TB drive each, you only have > about 1.6TB available for data. > > As Pete suggested in his response, how big are your drives, and what does > `df` tell you? > > --John > > > On Oct 26, 2012, at 5:26 PM, Nathan Trueblood > wrote: > > > Hey folks- > > > > I'm trying to figure out what's going wrong with my Swift deployment on > a small cluster of "mini" servers. I have a small test cluster (5 storage > nodes, 1 proxy) of mini-servers that are ARM-based. The proxy is a > regular, Intel-based server with plenty of RAM. The > object/account/container servers are relatively small, with 2GB of RAM per > node. > > > > Everything starts up fine, but now I'm trying to troubleshoot a strange > problem. After I successfully upload a few test files, it seems like the > storage system stops responding and the proxy gives me a 503 error. > > > > Here's the test sequence I run on my proxy: > > > > lab@proxy01:~/bin$ ./swiftcl.sh stat > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat > >Account: AUTH_system > > Containers: 5 > >Objects: 4 > > Bytes: 47804968 > > Accept-Ranges: bytes > > X-Timestamp: 1351294912.72119 > > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1 > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass > upload myfiles1 /home/lab/bigfile1 > > home/lab/bigfile1 > > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles2 /home/lab/bigfile1 > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass > upload myfiles2 /home/lab/bigfile1 > > home/lab/bigfile1 > > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles3 /home/lab/bigfile1 > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass > upload myfiles3 /home/lab/bigfile1 > > home/lab/bigfile1 > > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles4 /home/lab/bigfile1 > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass > upload myfiles4 /home/lab/bigfile1 > > home/lab/bigfile1 > > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles5 /home/lab/bigfile1 > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass > upload myfiles5 /home/lab/bigfile1 > > Object PUT failed: > http://172.16.1.111:8080/v1/AUTH_system/myfiles5/home/lab/bigfile1 503 > Service Unavailable [first 60 chars of response] 503 Service Unavailable > > > > The server is currently unavailable > > lab@proxy01:~/bin$ ./swiftcl.sh stat > > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat > >Account: AUTH_system > > Containers: 6 > >Objects: 5 > > Bytes: 59756210 > > Accept-Ranges: bytes > > X-Timestamp: 1351294912.72119 > > > > Here's the corresponding log on the Proxy: > > > > Oct 26 17:06:52 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/06/52 > GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 > > Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 > GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0017 > > Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 > GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 > > Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 > GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 > > Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 > GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 > > Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 > GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 > > Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 > GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 > > Oct 26 17:07:27 proxy01 proxy-server Handoff requested (1) (txn: > tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111) > > Oct 26 17:07:27 proxy01 proxy-server Handoff requested (2) (txn: > tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111) > > Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 > GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 > > Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 > GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 > > Oct 26 17:07:33 proxy01 proxy-server Handoff requested (1) (txn: > tx5f9659f74cb2491f
Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers
Sorry for my off topic question, it's the first time I heard of swift storage servers running an ARM processor. I think ARM architectures can be an interesting alternative to replace classic computing servers, but swift storage nodes did not catch my attention as a valid alternative, until now. From my perspective, the savings in power consumption and real state in your datacenter are minimized if you load your servers with two dozen SATA disks (a typical swift node configuration) and a dual 10GbE connection, for example. May be there are benefits I'm not aware of, but I would really love to hear about them :-) Cheers Diego Enviado desde mi iPhone, perdona la brevedad El 27/10/2012, a las 02:26, Nathan Trueblood escribió: > Hey folks- > > I'm trying to figure out what's going wrong with my Swift deployment on a > small cluster of "mini" servers. I have a small test cluster (5 storage > nodes, 1 proxy) of mini-servers that are ARM-based. The proxy is a regular, > Intel-based server with plenty of RAM. The object/account/container servers > are relatively small, with 2GB of RAM per node. > > Everything starts up fine, but now I'm trying to troubleshoot a strange > problem. After I successfully upload a few test files, it seems like the > storage system stops responding and the proxy gives me a 503 error. > > Here's the test sequence I run on my proxy: > > lab@proxy01:~/bin$ ./swiftcl.sh stat > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat >Account: AUTH_system > Containers: 5 >Objects: 4 > Bytes: 47804968 > Accept-Ranges: bytes > X-Timestamp: 1351294912.72119 > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1 > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > myfiles1 /home/lab/bigfile1 > home/lab/bigfile1 > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles2 /home/lab/bigfile1 > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > myfiles2 /home/lab/bigfile1 > home/lab/bigfile1 > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles3 /home/lab/bigfile1 > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > myfiles3 /home/lab/bigfile1 > home/lab/bigfile1 > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles4 /home/lab/bigfile1 > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > myfiles4 /home/lab/bigfile1 > home/lab/bigfile1 > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles5 /home/lab/bigfile1 > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > myfiles5 /home/lab/bigfile1 > Object PUT failed: > http://172.16.1.111:8080/v1/AUTH_system/myfiles5/home/lab/bigfile1 503 > Service Unavailable [first 60 chars of response] 503 Service Unavailable > > The server is currently unavailable > lab@proxy01:~/bin$ ./swiftcl.sh stat > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat >Account: AUTH_system > Containers: 6 >Objects: 5 > Bytes: 59756210 > Accept-Ranges: bytes > X-Timestamp: 1351294912.72119 > > Here's the corresponding log on the Proxy: > > Oct 26 17:06:52 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/06/52 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 > Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0017 > Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 > Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 > Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 > Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 > Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 > Oct 26 17:07:27 proxy01 proxy-server Handoff requested (1) (txn: > tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111) > Oct 26 17:07:27 proxy01 proxy-server Handoff requested (2) (txn: > tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111) > Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 > Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 > Oct 26 17:07:33 proxy01 proxy-server Handoff requested (1) (txn: > tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111) > Oct 26 17:07:33 proxy01 proxy-server Handoff requested (2) (txn: > tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111) > Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0009 > Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GE
Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers
A 507 is returned by the object servers in 2 situations: 1) the drives are full or 2) the drives have been unmounted because of disk error. It's highly likely that you simply have full drives. Remember that the usable space in your cluster is 1/N where N = replica count. As an example, with 3 replicas and 5 nodes with a single 1TB drive each, you only have about 1.6TB available for data. As Pete suggested in his response, how big are your drives, and what does `df` tell you? --John On Oct 26, 2012, at 5:26 PM, Nathan Trueblood wrote: > Hey folks- > > I'm trying to figure out what's going wrong with my Swift deployment on a > small cluster of "mini" servers. I have a small test cluster (5 storage > nodes, 1 proxy) of mini-servers that are ARM-based. The proxy is a regular, > Intel-based server with plenty of RAM. The object/account/container servers > are relatively small, with 2GB of RAM per node. > > Everything starts up fine, but now I'm trying to troubleshoot a strange > problem. After I successfully upload a few test files, it seems like the > storage system stops responding and the proxy gives me a 503 error. > > Here's the test sequence I run on my proxy: > > lab@proxy01:~/bin$ ./swiftcl.sh stat > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat >Account: AUTH_system > Containers: 5 >Objects: 4 > Bytes: 47804968 > Accept-Ranges: bytes > X-Timestamp: 1351294912.72119 > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1 > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > myfiles1 /home/lab/bigfile1 > home/lab/bigfile1 > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles2 /home/lab/bigfile1 > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > myfiles2 /home/lab/bigfile1 > home/lab/bigfile1 > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles3 /home/lab/bigfile1 > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > myfiles3 /home/lab/bigfile1 > home/lab/bigfile1 > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles4 /home/lab/bigfile1 > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > myfiles4 /home/lab/bigfile1 > home/lab/bigfile1 > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles5 /home/lab/bigfile1 > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload > myfiles5 /home/lab/bigfile1 > Object PUT failed: > http://172.16.1.111:8080/v1/AUTH_system/myfiles5/home/lab/bigfile1 503 > Service Unavailable [first 60 chars of response] 503 Service Unavailable > > The server is currently unavailable > lab@proxy01:~/bin$ ./swiftcl.sh stat > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat >Account: AUTH_system > Containers: 6 >Objects: 5 > Bytes: 59756210 > Accept-Ranges: bytes > X-Timestamp: 1351294912.72119 > > Here's the corresponding log on the Proxy: > > Oct 26 17:06:52 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/06/52 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 > Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0017 > Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 > Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 > Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 > Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 > Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 > Oct 26 17:07:27 proxy01 proxy-server Handoff requested (1) (txn: > tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111) > Oct 26 17:07:27 proxy01 proxy-server Handoff requested (2) (txn: > tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111) > Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 > Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 > Oct 26 17:07:33 proxy01 proxy-server Handoff requested (1) (txn: > tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111) > Oct 26 17:07:33 proxy01 proxy-server Handoff requested (2) (txn: > tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111) > Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0009 > Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GET > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0009 > Oct 26 17:07:39 proxy01 proxy-server Handoff requested (1) (txn: > tx8dc917a4a8c84c40a4429b7bab0323c6) (client_ip: 172.16.1.111) > Oct 26 1
Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers
On Fri, 26 Oct 2012 17:26:07 -0700 Nathan Trueblood wrote: > I'm trying to figure out what's going wrong with my Swift deployment on a > small cluster of "mini" servers. I have a small test cluster (5 storage > nodes, 1 proxy) of mini-servers that are ARM-based. The proxy is a > regular, Intel-based server with plenty of RAM. The > object/account/container servers are relatively small, with 2GB of RAM per > node. And the disk is how big? > Oct 26 17:07:46 data05 object-server 192.168.1.111 - - > [27/Oct/2012:00:07:46 +] "PUT > /sda6/150861/AUTH_system/myfiles5/home/lab/bigfile1" 507 - "-" > "tx8dc917a4a8c84c40a4429b7bab0323c6" "-" 0.0031 Well, what does df say? > The Object-servers do give a 507 error, which might indicate a disk > problem, but there is nothing wrong with the storage drive. And also if > there was a fundamental drive problem then I wouldn't be able to upload > objects in the first place. You could upload them to a reduced number of nodes, and then the replication would inflate the space used by the replication ratio. Finally, it's possible that tombstones are not properly expired for some reason. -- Pete ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] Troubleshooting Swift 1.7.4 on mini servers
Hey folks- I'm trying to figure out what's going wrong with my Swift deployment on a small cluster of "mini" servers. I have a small test cluster (5 storage nodes, 1 proxy) of mini-servers that are ARM-based. The proxy is a regular, Intel-based server with plenty of RAM. The object/account/container servers are relatively small, with 2GB of RAM per node. Everything starts up fine, but now I'm trying to troubleshoot a strange problem. After I successfully upload a few test files, it seems like the storage system stops responding and the proxy gives me a 503 error. Here's the test sequence I run on my proxy: lab@proxy01:~/bin$ ./swiftcl.sh stat swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat Account: AUTH_system Containers: 5 Objects: 4 Bytes: 47804968 Accept-Ranges: bytes X-Timestamp: 1351294912.72119 lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1 swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload myfiles1 /home/lab/bigfile1 home/lab/bigfile1 lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles2 /home/lab/bigfile1 swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload myfiles2 /home/lab/bigfile1 home/lab/bigfile1 lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles3 /home/lab/bigfile1 swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload myfiles3 /home/lab/bigfile1 home/lab/bigfile1 lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles4 /home/lab/bigfile1 swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload myfiles4 /home/lab/bigfile1 home/lab/bigfile1 lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles5 /home/lab/bigfile1 swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload myfiles5 /home/lab/bigfile1 Object PUT failed: http://172.16.1.111:8080/v1/AUTH_system/myfiles5/home/lab/bigfile1 503 Service Unavailable [first 60 chars of response] 503 Service Unavailable The server is currently unavailable lab@proxy01:~/bin$ ./swiftcl.sh stat swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat Account: AUTH_system Containers: 6 Objects: 5 Bytes: 59756210 Accept-Ranges: bytes X-Timestamp: 1351294912.72119 Here's the corresponding log on the Proxy: Oct 26 17:06:52 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/06/52 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0017 Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 Oct 26 17:07:27 proxy01 proxy-server Handoff requested (1) (txn: tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111) Oct 26 17:07:27 proxy01 proxy-server Handoff requested (2) (txn: tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111) Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010 Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016 Oct 26 17:07:33 proxy01 proxy-server Handoff requested (1) (txn: tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111) Oct 26 17:07:33 proxy01 proxy-server Handoff requested (2) (txn: tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111) Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0009 Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0009 Oct 26 17:07:39 proxy01 proxy-server Handoff requested (1) (txn: tx8dc917a4a8c84c40a4429b7bab0323c6) (client_ip: 172.16.1.111) Oct 26 17:07:39 proxy01 proxy-server Handoff requested (2) (txn: tx8dc917a4a8c84c40a4429b7bab0323c6) (client_ip: 172.16.1.111) Oct 26 17:07:40 proxy01 proxy-server Object PUT returning 503, 1/2 required connections (txn: tx8dc917a4a8c84c40a4429b7bab0323c6) (client_ip: 172.16.1.111) Oct 26 17:07:41 proxy01 proxy-server Object PUT returning 503, 1/2 required connections (txn: tx07a1f5dfaa23445a88eaa4a2ade68466) (client_ip: 172.16.1.111) Oct 26 17:07:43 proxy01 proxy-server Object PUT returning 503, 1/2 required connections (txn: tx938d08b706844db3886695b798bd9fad) (client_ip: 172.16.1.111) Oct 26 17:07:47 proxy01 proxy-server Object PUT returning 503, 1/2 required connections (txn: txa35e9f8a54924f139e13d6f3a5dc457f