Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-30 Thread Rick Jones

On 10/29/2012 07:37 PM, Pete Zaitcev wrote:

On Mon, 29 Oct 2012 18:16:52 -0700
Nathan Trueblood nat...@truebloodllc.com wrote:


Definitely NOT a problem with the filesystem, but something is causing the
object-server to think there is a problem with the filesystem.


If you are willing to go all-out, you can probably catch the
error with strace, if it works on ARM.


Strace is your friend even if he is sometimes a bit on the chatty side. 
 It looks as though there is at least some support for ARM if 
http://packages.debian.org/search?keywords=strace is any indication.


rick jones


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-30 Thread Nathan Trueblood
No disk errors in the kern.log.   The filesystem is fine.   I really think
this will turn out to be a bug or a timing (slowness) issue.

I will try some of the other recent suggestions, and failing those try to
track this down with strace.

Thx.

On Mon, Oct 29, 2012 at 7:02 PM, Alex Yang alex890...@gmail.com wrote:

 There are any error about disk in the kern.log?


 2012/10/30 Nathan Trueblood nat...@truebloodllc.com

 Still no further clues.   I re-created all the volumes I'm using for
 Swift.  Plenty of Inodes free:

  lab@data02:~$ df -i
 FilesystemInodes IUsed IFree IUse% Mounted on
 /dev/sda2   12214272 39290  121749821% /
 none  107979   4821074971% /dev
 none  107979   2681077111% /run
 none  107979 21079771% /run/lock
 none  107979 11079781% /run/shm
 /dev/sda1  4915223 491291% /boot
 /dev/sda4  13404640037 1340463631% /srv/node/sda4

 I successfully upload a small object to container cont1, then cont2.
 When I upload to cont3, I see the following in the object-server log
 (data02)

 This seems to be the problematic sequence:

 Data02 has ip 192.168.1.202
 Data03 has ip 192.168.1.203

 1. First the account server reports an HTTP 201 on the container from a
 different object server in a different zone.
 2. Then the object server reports a 404 trying to HEAD the new object.
 3. Then the object server reports a 507 trying to PUT the new object.

 From this point the operation eventually fails and the proxy reports a
 503.

 Oct 29 17:58:20 data02 account-server 192.168.1.203 - -
 [30/Oct/2012:00:58:20 +] PUT /sda4/116021/AUTH_system/cont3 201 -
 tx5a3ca6c845af41928e0ba6b7bc58d2da - - 0.0082 
 Oct 29 17:58:20 data02 object-server 192.168.1.111 - -
 [30/Oct/2012:00:58:20 +] HEAD
 /sda4/257613/AUTH_system/cont3/home/lab/bigfile3 404 - -
 tx5f21503ff12e45e39a80eb52f6757261 - 0.0011
 Oct 29 17:58:20 data02 object-server 192.168.1.111 - -
 [30/Oct/2012:00:58:20 +] PUT
 /sda4/257613/AUTH_system/cont3/home/lab/bigfile3 507 - -
 tx425494dc372740e28d043a07d3a08b9a - 0.0031

 In an earlier, successful transaction I noticed that between Steps 1 and
 2 above, there is a response from the container-server:

 Oct 29 17:57:59 data02 account-server 192.168.1.204 - -
 [30/Oct/2012:00:57:59 +] PUT /sda4/116021/AUTH_system/cont2 201 -
 txb10d75886bf14e4eba14fcc52d81c5d9 - - 0.0182 
 Oct 29 17:57:59 data02 container-server 192.168.1.111 - -
 [30/Oct/2012:00:57:59 +] PUT /sda4/122355/AUTH_system/cont2 201 -
 txb10d75886bf14e4eba14fcc52d81c5d9 - - 0.1554
 Oct 29 17:57:59 data02 object-server 192.168.1.111 - -
 [30/Oct/2012:00:57:59 +] HEAD
 /sda4/226151/AUTH_system/cont2/home/lab/bigfile3 404 - -
 tx1c514850530849d1bfbfa716d9039b87 - 0.0012
 Oct 29 17:57:59 data02 container-server 192.168.1.204 - -
 [30/Oct/2012:00:57:59 +] PUT
 /sda4/122355/AUTH_system/cont2/home/lab/bigfile3 201 -
 tx8130af5cae484e5f9c5a25541d1c87aa - - 0.0041
 Oct 29 17:57:59 data02 object-server 192.168.1.111 - -
 [30/Oct/2012:00:57:59 +] PUT
 /sda4/226151/AUTH_system/cont2/home/lab/bigfile3 201 - -
 tx8130af5cae484e5f9c5a25541d1c87aa - 0.1716


 So maybe the container server is failing to create the new container?
 Maybe a bug in auto-create of containers?

 Definitely NOT a problem with the filesystem, but something is causing
 the object-server to think there is a problem with the filesystem.

 I suspect a bug in one of the underlying libraries.

 Any further suggestions on how to troubleshoot?

 Thanks.   When I finally find the solution, I'll post my results.

 -N

 On Fri, Oct 26, 2012 at 11:21 PM, John Dickinson m...@not.mn wrote:

 A 507 is returned by the object servers in 2 situations: 1) the drives
 are full or 2) the drives have been unmounted because of disk error.

 It's highly likely that you simply have full drives. Remember that the
 usable space in your cluster is 1/N where N = replica count. As an example,
 with 3 replicas and 5 nodes with a single 1TB drive each, you only have
 about 1.6TB available for data.

 As Pete suggested in his response, how big are your drives, and what
 does `df` tell you?

 --John


 On Oct 26, 2012, at 5:26 PM, Nathan Trueblood nat...@truebloodllc.com
 wrote:

  Hey folks-
 
  I'm trying to figure out what's going wrong with my Swift deployment
 on a small cluster of mini servers.   I have a small test cluster (5
 storage nodes, 1 proxy) of mini-servers that are ARM-based.   The proxy is
 a regular, Intel-based server with plenty of RAM.   The
 object/account/container servers are relatively small, with 2GB of RAM per
 node.
 
  Everything starts up fine, but now I'm trying to troubleshoot a
 strange problem.   After I successfully upload a few test files, it seems
 like the storage system stops responding and the proxy gives me a 503 error.
 
  Here's the test sequence I run on my proxy:
 
  lab@proxy01:~/bin$ ./swiftcl.sh stat
  

Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-30 Thread Nathan Trueblood
The filesystem is XFS, and I used the recommended mkfs and mount options
for Swift.

The file size seems to have no bearing on the issue, although I haven't
tried really tiny files.   Bigfile3 is only 200K.

I'll try disabling fallocate...

On Mon, Oct 29, 2012 at 7:37 PM, Pete Zaitcev zait...@redhat.com wrote:

 On Mon, 29 Oct 2012 18:16:52 -0700
 Nathan Trueblood nat...@truebloodllc.com wrote:

  Definitely NOT a problem with the filesystem, but something is causing
 the
  object-server to think there is a problem with the filesystem.

 If you are willing to go all-out, you can probably catch the
 error with strace, if it works on ARM. Failing that, find all places
 where 507 is generated and see if any exceptions are caught, by
 modifying the source, I'm afraid to say.

  I suspect a bug in one of the underlying libraries.

 That's a possibility. Or, it could be a kernel bug. You are using XFS,
 right? If it were something other than XFS or ext4, I would suspect
 ARM blowing over the 2GB barrier somewhere, since your object is
 called bigfile3. As it is, you have little option than to divide
 the layers until you identify the one that's broken.

 BTW, make sure to disable the fallocate, since we're at it.

 -- Pete

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-30 Thread Pete Zaitcev
On Tue, 30 Oct 2012 11:07:55 -0700
Nathan Trueblood nat...@truebloodllc.com wrote:

 The file size seems to have no bearing on the issue, although I haven't
 tried really tiny files.   Bigfile3 is only 200K.

Okay. BTW, do not forget to use curl and issue the same PUT that proxy does,
see if it throws 507 repeateably. That could shortcut some of the testing.

-- Pete

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp



Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-30 Thread Nathan Trueblood
Ok, if I was giving out t-shirts for finding this issue then the prize
would go to Pete.   Thank you

Disabling fallocate did the trick.   I was slowly working my way through
all the object-server config options and hadn't gotten to that one yet.
Turning features on and off by brute force is admittedly lame, but
sometimes that's all you have.

I also turned off all the other things I was doing to try to slow down the
mini-servers, but disabling fallocate was all that was necessary.   Here is
my config:

[DEFAULT]
bind_ip = 192.168.1.202
workers = 1
disable_fallocate = true

[pipeline:main]
pipeline = object-server

[app:object-server]
use = egg:swift#object

[object-replicator]

[object-updater]

[object-auditor]

A few more details...

My servers are running Ubuntu 12.04 LTS.   A straight-up apt-get of all the
pre-requisites did NOT produce a working Swift deployment on Arm.
Although  all the dependencies would deploy fine and the Swift services
would start up, the proxy-server could not communicate with the storage
nodes.

So I also had to get older, Armel versions of the python-greenlet and
python-eventlet.

https://launchpad.net/ubuntu/precise/armel/python-greenlet/0.3.1-1ubuntu5.1
https://launchpad.net/ubuntu/precise/armel/python-eventlet/0.9.16-1ubuntu4.1

Once I deployed those older libraries for Armel, then my Swift cluster
worked (except for the fallocate issue).

Thanks for everyone's help.

-N

On Tue, Oct 30, 2012 at 11:07 AM, Nathan Trueblood
nat...@truebloodllc.comwrote:

 The filesystem is XFS, and I used the recommended mkfs and mount options
 for Swift.

 The file size seems to have no bearing on the issue, although I haven't
 tried really tiny files.   Bigfile3 is only 200K.

 I'll try disabling fallocate...


 On Mon, Oct 29, 2012 at 7:37 PM, Pete Zaitcev zait...@redhat.com wrote:

 On Mon, 29 Oct 2012 18:16:52 -0700
 Nathan Trueblood nat...@truebloodllc.com wrote:

  Definitely NOT a problem with the filesystem, but something is causing
 the
  object-server to think there is a problem with the filesystem.

 If you are willing to go all-out, you can probably catch the
 error with strace, if it works on ARM. Failing that, find all places
 where 507 is generated and see if any exceptions are caught, by
 modifying the source, I'm afraid to say.

  I suspect a bug in one of the underlying libraries.

 That's a possibility. Or, it could be a kernel bug. You are using XFS,
 right? If it were something other than XFS or ext4, I would suspect
 ARM blowing over the 2GB barrier somewhere, since your object is
 called bigfile3. As it is, you have little option than to divide
 the layers until you identify the one that's broken.

 BTW, make sure to disable the fallocate, since we're at it.

 -- Pete



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-29 Thread Nathan Trueblood
Yeah, I read about the 507 error.However, when the error occurs on my I
can see with 'df' that the drive is only 1% full and is definitely not
unmounted.   I can write files to the mounted filesystem directly before,
during, and after the Swift error occurs.   So the problem must be some
kind of timeout that is causing the object server to think that something
is wrong with the disk.

I'll keep digging...

On Fri, Oct 26, 2012 at 11:21 PM, John Dickinson m...@not.mn wrote:

 A 507 is returned by the object servers in 2 situations: 1) the drives are
 full or 2) the drives have been unmounted because of disk error.

 It's highly likely that you simply have full drives. Remember that the
 usable space in your cluster is 1/N where N = replica count. As an example,
 with 3 replicas and 5 nodes with a single 1TB drive each, you only have
 about 1.6TB available for data.

 As Pete suggested in his response, how big are your drives, and what does
 `df` tell you?

 --John


 On Oct 26, 2012, at 5:26 PM, Nathan Trueblood nat...@truebloodllc.com
 wrote:

  Hey folks-
 
  I'm trying to figure out what's going wrong with my Swift deployment on
 a small cluster of mini servers.   I have a small test cluster (5 storage
 nodes, 1 proxy) of mini-servers that are ARM-based.   The proxy is a
 regular, Intel-based server with plenty of RAM.   The
 object/account/container servers are relatively small, with 2GB of RAM per
 node.
 
  Everything starts up fine, but now I'm trying to troubleshoot a strange
 problem.   After I successfully upload a few test files, it seems like the
 storage system stops responding and the proxy gives me a 503 error.
 
  Here's the test sequence I run on my proxy:
 
  lab@proxy01:~/bin$ ./swiftcl.sh stat
  swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
 Account: AUTH_system
  Containers: 5
 Objects: 4
   Bytes: 47804968
  Accept-Ranges: bytes
  X-Timestamp: 1351294912.72119
  lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1
  swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass
 upload myfiles1 /home/lab/bigfile1
  home/lab/bigfile1
  lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles2 /home/lab/bigfile1
  swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass
 upload myfiles2 /home/lab/bigfile1
  home/lab/bigfile1
  lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles3 /home/lab/bigfile1
  swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass
 upload myfiles3 /home/lab/bigfile1
  home/lab/bigfile1
  lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles4 /home/lab/bigfile1
  swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass
 upload myfiles4 /home/lab/bigfile1
  home/lab/bigfile1
  lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles5 /home/lab/bigfile1
  swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass
 upload myfiles5 /home/lab/bigfile1
  Object PUT failed:
 http://172.16.1.111:8080/v1/AUTH_system/myfiles5/home/lab/bigfile1 503
 Service Unavailable  [first 60 chars of response] 503 Service Unavailable
 
  The server is currently unavailable
  lab@proxy01:~/bin$ ./swiftcl.sh stat
  swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
 Account: AUTH_system
  Containers: 6
 Objects: 5
   Bytes: 59756210
  Accept-Ranges: bytes
  X-Timestamp: 1351294912.72119
 
  Here's the corresponding log on the Proxy:
 
  Oct 26 17:06:52 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/06/52
 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
  Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13
 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0017
  Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13
 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
  Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22
 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
  Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22
 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
  Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27
 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
  Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27
 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
  Oct 26 17:07:27 proxy01 proxy-server Handoff requested (1) (txn:
 tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)
  Oct 26 17:07:27 proxy01 proxy-server Handoff requested (2) (txn:
 tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)
  Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33
 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
  Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33
 GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
  Oct 26 17:07:33 proxy01 proxy-server Handoff requested (1) (txn:
 tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111)
  Oct 26 17:07:33 proxy01 proxy-server Handoff requested (2) (txn:
 

Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-29 Thread John Dickinson
Also check the number of inodes used: `df -i`

--John



On Oct 29, 2012, at 8:31 AM, Nathan Trueblood nat...@truebloodllc.com wrote:

 Yeah, I read about the 507 error.However, when the error occurs on my I 
 can see with 'df' that the drive is only 1% full and is definitely not 
 unmounted.   I can write files to the mounted filesystem directly before, 
 during, and after the Swift error occurs.   So the problem must be some kind 
 of timeout that is causing the object server to think that something is wrong 
 with the disk.
 
 I'll keep digging... 
 
 On Fri, Oct 26, 2012 at 11:21 PM, John Dickinson m...@not.mn wrote:
 A 507 is returned by the object servers in 2 situations: 1) the drives are 
 full or 2) the drives have been unmounted because of disk error.
 
 It's highly likely that you simply have full drives. Remember that the usable 
 space in your cluster is 1/N where N = replica count. As an example, with 3 
 replicas and 5 nodes with a single 1TB drive each, you only have about 1.6TB 
 available for data.
 
 As Pete suggested in his response, how big are your drives, and what does 
 `df` tell you?
 
 --John
 
 
 On Oct 26, 2012, at 5:26 PM, Nathan Trueblood nat...@truebloodllc.com wrote:
 
  Hey folks-
 
  I'm trying to figure out what's going wrong with my Swift deployment on a 
  small cluster of mini servers.   I have a small test cluster (5 storage 
  nodes, 1 proxy) of mini-servers that are ARM-based.   The proxy is a 
  regular, Intel-based server with plenty of RAM.   The 
  object/account/container servers are relatively small, with 2GB of RAM per 
  node.
 
  Everything starts up fine, but now I'm trying to troubleshoot a strange 
  problem.   After I successfully upload a few test files, it seems like the 
  storage system stops responding and the proxy gives me a 503 error.
 
  Here's the test sequence I run on my proxy:
 
  lab@proxy01:~/bin$ ./swiftcl.sh stat
  swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
 Account: AUTH_system
  Containers: 5
 Objects: 4
   Bytes: 47804968
  Accept-Ranges: bytes
  X-Timestamp: 1351294912.72119
  lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1
  swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
  myfiles1 /home/lab/bigfile1
  home/lab/bigfile1
  lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles2 /home/lab/bigfile1
  swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
  myfiles2 /home/lab/bigfile1
  home/lab/bigfile1
  lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles3 /home/lab/bigfile1
  swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
  myfiles3 /home/lab/bigfile1
  home/lab/bigfile1
  lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles4 /home/lab/bigfile1
  swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
  myfiles4 /home/lab/bigfile1
  home/lab/bigfile1
  lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles5 /home/lab/bigfile1
  swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
  myfiles5 /home/lab/bigfile1
  Object PUT failed: 
  http://172.16.1.111:8080/v1/AUTH_system/myfiles5/home/lab/bigfile1 503 
  Service Unavailable  [first 60 chars of response] 503 Service Unavailable
 
  The server is currently unavailable
  lab@proxy01:~/bin$ ./swiftcl.sh stat
  swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
 Account: AUTH_system
  Containers: 6
 Objects: 5
   Bytes: 59756210
  Accept-Ranges: bytes
  X-Timestamp: 1351294912.72119
 
  Here's the corresponding log on the Proxy:
 
  Oct 26 17:06:52 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/06/52 GET 
  /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
  Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET 
  /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0017
  Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET 
  /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
  Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET 
  /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
  Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET 
  /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
  Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET 
  /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
  Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET 
  /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
  Oct 26 17:07:27 proxy01 proxy-server Handoff requested (1) (txn: 
  tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)
  Oct 26 17:07:27 proxy01 proxy-server Handoff requested (2) (txn: 
  tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)
  Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET 
  /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
  Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET 
  /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 

Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-29 Thread Nathan Trueblood
Still no further clues.   I re-created all the volumes I'm using for Swift.
 Plenty of Inodes free:

lab@data02:~$ df -i
FilesystemInodes IUsed IFree IUse% Mounted on
/dev/sda2   12214272 39290  121749821% /
none  107979   4821074971% /dev
none  107979   2681077111% /run
none  107979 21079771% /run/lock
none  107979 11079781% /run/shm
/dev/sda1  4915223 491291% /boot
/dev/sda4  13404640037 1340463631% /srv/node/sda4

I successfully upload a small object to container cont1, then cont2.   When
I upload to cont3, I see the following in the object-server log (data02)

This seems to be the problematic sequence:

Data02 has ip 192.168.1.202
Data03 has ip 192.168.1.203

1. First the account server reports an HTTP 201 on the container from a
different object server in a different zone.
2. Then the object server reports a 404 trying to HEAD the new object.
3. Then the object server reports a 507 trying to PUT the new object.

From this point the operation eventually fails and the proxy reports a 503.

Oct 29 17:58:20 data02 account-server 192.168.1.203 - -
[30/Oct/2012:00:58:20 +] PUT /sda4/116021/AUTH_system/cont3 201 -
tx5a3ca6c845af41928e0ba6b7bc58d2da - - 0.0082 
Oct 29 17:58:20 data02 object-server 192.168.1.111 - -
[30/Oct/2012:00:58:20 +] HEAD
/sda4/257613/AUTH_system/cont3/home/lab/bigfile3 404 - -
tx5f21503ff12e45e39a80eb52f6757261 - 0.0011
Oct 29 17:58:20 data02 object-server 192.168.1.111 - -
[30/Oct/2012:00:58:20 +] PUT
/sda4/257613/AUTH_system/cont3/home/lab/bigfile3 507 - -
tx425494dc372740e28d043a07d3a08b9a - 0.0031

In an earlier, successful transaction I noticed that between Steps 1 and 2
above, there is a response from the container-server:

Oct 29 17:57:59 data02 account-server 192.168.1.204 - -
[30/Oct/2012:00:57:59 +] PUT /sda4/116021/AUTH_system/cont2 201 -
txb10d75886bf14e4eba14fcc52d81c5d9 - - 0.0182 
Oct 29 17:57:59 data02 container-server 192.168.1.111 - -
[30/Oct/2012:00:57:59 +] PUT /sda4/122355/AUTH_system/cont2 201 -
txb10d75886bf14e4eba14fcc52d81c5d9 - - 0.1554
Oct 29 17:57:59 data02 object-server 192.168.1.111 - -
[30/Oct/2012:00:57:59 +] HEAD
/sda4/226151/AUTH_system/cont2/home/lab/bigfile3 404 - -
tx1c514850530849d1bfbfa716d9039b87 - 0.0012
Oct 29 17:57:59 data02 container-server 192.168.1.204 - -
[30/Oct/2012:00:57:59 +] PUT
/sda4/122355/AUTH_system/cont2/home/lab/bigfile3 201 -
tx8130af5cae484e5f9c5a25541d1c87aa - - 0.0041
Oct 29 17:57:59 data02 object-server 192.168.1.111 - -
[30/Oct/2012:00:57:59 +] PUT
/sda4/226151/AUTH_system/cont2/home/lab/bigfile3 201 - -
tx8130af5cae484e5f9c5a25541d1c87aa - 0.1716


So maybe the container server is failing to create the new container?
Maybe a bug in auto-create of containers?

Definitely NOT a problem with the filesystem, but something is causing the
object-server to think there is a problem with the filesystem.

I suspect a bug in one of the underlying libraries.

Any further suggestions on how to troubleshoot?

Thanks.   When I finally find the solution, I'll post my results.

-N

On Fri, Oct 26, 2012 at 11:21 PM, John Dickinson m...@not.mn wrote:

 A 507 is returned by the object servers in 2 situations: 1) the drives are
 full or 2) the drives have been unmounted because of disk error.

 It's highly likely that you simply have full drives. Remember that the
 usable space in your cluster is 1/N where N = replica count. As an example,
 with 3 replicas and 5 nodes with a single 1TB drive each, you only have
 about 1.6TB available for data.

 As Pete suggested in his response, how big are your drives, and what does
 `df` tell you?

 --John


 On Oct 26, 2012, at 5:26 PM, Nathan Trueblood nat...@truebloodllc.com
 wrote:

  Hey folks-
 
  I'm trying to figure out what's going wrong with my Swift deployment on
 a small cluster of mini servers.   I have a small test cluster (5 storage
 nodes, 1 proxy) of mini-servers that are ARM-based.   The proxy is a
 regular, Intel-based server with plenty of RAM.   The
 object/account/container servers are relatively small, with 2GB of RAM per
 node.
 
  Everything starts up fine, but now I'm trying to troubleshoot a strange
 problem.   After I successfully upload a few test files, it seems like the
 storage system stops responding and the proxy gives me a 503 error.
 
  Here's the test sequence I run on my proxy:
 
  lab@proxy01:~/bin$ ./swiftcl.sh stat
  swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
 Account: AUTH_system
  Containers: 5
 Objects: 4
   Bytes: 47804968
  Accept-Ranges: bytes
  X-Timestamp: 1351294912.72119
  lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1
  swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass
 upload myfiles1 /home/lab/bigfile1
  home/lab/bigfile1
  lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles2 /home/lab/bigfile1
  swift -A 

Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-29 Thread Alex Yang
There are any error about disk in the kern.log?

2012/10/30 Nathan Trueblood nat...@truebloodllc.com

 Still no further clues.   I re-created all the volumes I'm using for
 Swift.  Plenty of Inodes free:

 lab@data02:~$ df -i
 FilesystemInodes IUsed IFree IUse% Mounted on
 /dev/sda2   12214272 39290  121749821% /
 none  107979   4821074971% /dev
 none  107979   2681077111% /run
 none  107979 21079771% /run/lock
 none  107979 11079781% /run/shm
 /dev/sda1  4915223 491291% /boot
 /dev/sda4  13404640037 1340463631% /srv/node/sda4

 I successfully upload a small object to container cont1, then cont2.
 When I upload to cont3, I see the following in the object-server log
 (data02)

 This seems to be the problematic sequence:

 Data02 has ip 192.168.1.202
 Data03 has ip 192.168.1.203

 1. First the account server reports an HTTP 201 on the container from a
 different object server in a different zone.
 2. Then the object server reports a 404 trying to HEAD the new object.
 3. Then the object server reports a 507 trying to PUT the new object.

 From this point the operation eventually fails and the proxy reports a 503.

 Oct 29 17:58:20 data02 account-server 192.168.1.203 - -
 [30/Oct/2012:00:58:20 +] PUT /sda4/116021/AUTH_system/cont3 201 -
 tx5a3ca6c845af41928e0ba6b7bc58d2da - - 0.0082 
 Oct 29 17:58:20 data02 object-server 192.168.1.111 - -
 [30/Oct/2012:00:58:20 +] HEAD
 /sda4/257613/AUTH_system/cont3/home/lab/bigfile3 404 - -
 tx5f21503ff12e45e39a80eb52f6757261 - 0.0011
 Oct 29 17:58:20 data02 object-server 192.168.1.111 - -
 [30/Oct/2012:00:58:20 +] PUT
 /sda4/257613/AUTH_system/cont3/home/lab/bigfile3 507 - -
 tx425494dc372740e28d043a07d3a08b9a - 0.0031

 In an earlier, successful transaction I noticed that between Steps 1 and 2
 above, there is a response from the container-server:

 Oct 29 17:57:59 data02 account-server 192.168.1.204 - -
 [30/Oct/2012:00:57:59 +] PUT /sda4/116021/AUTH_system/cont2 201 -
 txb10d75886bf14e4eba14fcc52d81c5d9 - - 0.0182 
 Oct 29 17:57:59 data02 container-server 192.168.1.111 - -
 [30/Oct/2012:00:57:59 +] PUT /sda4/122355/AUTH_system/cont2 201 -
 txb10d75886bf14e4eba14fcc52d81c5d9 - - 0.1554
 Oct 29 17:57:59 data02 object-server 192.168.1.111 - -
 [30/Oct/2012:00:57:59 +] HEAD
 /sda4/226151/AUTH_system/cont2/home/lab/bigfile3 404 - -
 tx1c514850530849d1bfbfa716d9039b87 - 0.0012
 Oct 29 17:57:59 data02 container-server 192.168.1.204 - -
 [30/Oct/2012:00:57:59 +] PUT
 /sda4/122355/AUTH_system/cont2/home/lab/bigfile3 201 -
 tx8130af5cae484e5f9c5a25541d1c87aa - - 0.0041
 Oct 29 17:57:59 data02 object-server 192.168.1.111 - -
 [30/Oct/2012:00:57:59 +] PUT
 /sda4/226151/AUTH_system/cont2/home/lab/bigfile3 201 - -
 tx8130af5cae484e5f9c5a25541d1c87aa - 0.1716


 So maybe the container server is failing to create the new container?
 Maybe a bug in auto-create of containers?

 Definitely NOT a problem with the filesystem, but something is causing the
 object-server to think there is a problem with the filesystem.

 I suspect a bug in one of the underlying libraries.

 Any further suggestions on how to troubleshoot?

 Thanks.   When I finally find the solution, I'll post my results.

 -N

 On Fri, Oct 26, 2012 at 11:21 PM, John Dickinson m...@not.mn wrote:

 A 507 is returned by the object servers in 2 situations: 1) the drives
 are full or 2) the drives have been unmounted because of disk error.

 It's highly likely that you simply have full drives. Remember that the
 usable space in your cluster is 1/N where N = replica count. As an example,
 with 3 replicas and 5 nodes with a single 1TB drive each, you only have
 about 1.6TB available for data.

 As Pete suggested in his response, how big are your drives, and what does
 `df` tell you?

 --John


 On Oct 26, 2012, at 5:26 PM, Nathan Trueblood nat...@truebloodllc.com
 wrote:

  Hey folks-
 
  I'm trying to figure out what's going wrong with my Swift deployment on
 a small cluster of mini servers.   I have a small test cluster (5 storage
 nodes, 1 proxy) of mini-servers that are ARM-based.   The proxy is a
 regular, Intel-based server with plenty of RAM.   The
 object/account/container servers are relatively small, with 2GB of RAM per
 node.
 
  Everything starts up fine, but now I'm trying to troubleshoot a strange
 problem.   After I successfully upload a few test files, it seems like the
 storage system stops responding and the proxy gives me a 503 error.
 
  Here's the test sequence I run on my proxy:
 
  lab@proxy01:~/bin$ ./swiftcl.sh stat
  swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
 Account: AUTH_system
  Containers: 5
 Objects: 4
   Bytes: 47804968
  Accept-Ranges: bytes
  X-Timestamp: 1351294912.72119
  lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1
  swift -A http://proxy01:8080/auth/v1.0 -U 

Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-29 Thread Pete Zaitcev
On Mon, 29 Oct 2012 18:16:52 -0700
Nathan Trueblood nat...@truebloodllc.com wrote:

 Definitely NOT a problem with the filesystem, but something is causing the
 object-server to think there is a problem with the filesystem.

If you are willing to go all-out, you can probably catch the
error with strace, if it works on ARM. Failing that, find all places
where 507 is generated and see if any exceptions are caught, by
modifying the source, I'm afraid to say.

 I suspect a bug in one of the underlying libraries.

That's a possibility. Or, it could be a kernel bug. You are using XFS,
right? If it were something other than XFS or ext4, I would suspect
ARM blowing over the 2GB barrier somewhere, since your object is
called bigfile3. As it is, you have little option than to divide
the layers until you identify the one that's broken.

BTW, make sure to disable the fallocate, since we're at it.

-- Pete

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-27 Thread John Dickinson
A 507 is returned by the object servers in 2 situations: 1) the drives are full 
or 2) the drives have been unmounted because of disk error.

It's highly likely that you simply have full drives. Remember that the usable 
space in your cluster is 1/N where N = replica count. As an example, with 3 
replicas and 5 nodes with a single 1TB drive each, you only have about 1.6TB 
available for data.

As Pete suggested in his response, how big are your drives, and what does `df` 
tell you?

--John


On Oct 26, 2012, at 5:26 PM, Nathan Trueblood nat...@truebloodllc.com wrote:

 Hey folks-
 
 I'm trying to figure out what's going wrong with my Swift deployment on a 
 small cluster of mini servers.   I have a small test cluster (5 storage 
 nodes, 1 proxy) of mini-servers that are ARM-based.   The proxy is a regular, 
 Intel-based server with plenty of RAM.   The object/account/container servers 
 are relatively small, with 2GB of RAM per node.
 
 Everything starts up fine, but now I'm trying to troubleshoot a strange 
 problem.   After I successfully upload a few test files, it seems like the 
 storage system stops responding and the proxy gives me a 503 error.
 
 Here's the test sequence I run on my proxy:
 
 lab@proxy01:~/bin$ ./swiftcl.sh stat
 swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
Account: AUTH_system
 Containers: 5
Objects: 4
  Bytes: 47804968
 Accept-Ranges: bytes
 X-Timestamp: 1351294912.72119
 lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1 
 swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
 myfiles1 /home/lab/bigfile1
 home/lab/bigfile1
 lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles2 /home/lab/bigfile1 
 swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
 myfiles2 /home/lab/bigfile1
 home/lab/bigfile1
 lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles3 /home/lab/bigfile1 
 swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
 myfiles3 /home/lab/bigfile1
 home/lab/bigfile1
 lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles4 /home/lab/bigfile1 
 swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
 myfiles4 /home/lab/bigfile1
 home/lab/bigfile1
 lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles5 /home/lab/bigfile1 
 swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
 myfiles5 /home/lab/bigfile1
 Object PUT failed: 
 http://172.16.1.111:8080/v1/AUTH_system/myfiles5/home/lab/bigfile1 503 
 Service Unavailable  [first 60 chars of response] 503 Service Unavailable
 
 The server is currently unavailable
 lab@proxy01:~/bin$ ./swiftcl.sh stat
 swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
Account: AUTH_system
 Containers: 6
Objects: 5
  Bytes: 59756210
 Accept-Ranges: bytes
 X-Timestamp: 1351294912.72119
 
 Here's the corresponding log on the Proxy:
 
 Oct 26 17:06:52 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/06/52 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
 Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0017
 Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
 Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
 Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
 Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
 Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
 Oct 26 17:07:27 proxy01 proxy-server Handoff requested (1) (txn: 
 tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)
 Oct 26 17:07:27 proxy01 proxy-server Handoff requested (2) (txn: 
 tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)
 Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
 Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
 Oct 26 17:07:33 proxy01 proxy-server Handoff requested (1) (txn: 
 tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111)
 Oct 26 17:07:33 proxy01 proxy-server Handoff requested (2) (txn: 
 tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111)
 Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0009
 Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0009
 Oct 26 17:07:39 proxy01 proxy-server Handoff requested (1) (txn: 
 tx8dc917a4a8c84c40a4429b7bab0323c6) (client_ip: 172.16.1.111)
 Oct 26 17:07:39 proxy01 proxy-server Handoff requested (2) (txn: 
 

Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-27 Thread Diego Parrilla
Sorry for my off topic question, it's the first time I heard of swift storage 
servers running an ARM processor. 

I think ARM architectures can be an interesting alternative to replace classic 
computing servers, but swift storage nodes did not catch my attention as a 
valid alternative, until now.

From my perspective, the savings in power consumption and real state in your 
datacenter are minimized if you load your servers with two dozen SATA disks (a 
typical swift node configuration) and a dual 10GbE connection, for example.

May be there are benefits I'm not aware of, but I would really love to hear 
about them :-)

Cheers
Diego

Enviado desde mi iPhone, perdona la brevedad

El 27/10/2012, a las 02:26, Nathan Trueblood nat...@truebloodllc.com escribió:

 Hey folks-
 
 I'm trying to figure out what's going wrong with my Swift deployment on a 
 small cluster of mini servers.   I have a small test cluster (5 storage 
 nodes, 1 proxy) of mini-servers that are ARM-based.   The proxy is a regular, 
 Intel-based server with plenty of RAM.   The object/account/container servers 
 are relatively small, with 2GB of RAM per node.
 
 Everything starts up fine, but now I'm trying to troubleshoot a strange 
 problem.   After I successfully upload a few test files, it seems like the 
 storage system stops responding and the proxy gives me a 503 error.
 
 Here's the test sequence I run on my proxy:
 
 lab@proxy01:~/bin$ ./swiftcl.sh stat
 swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
Account: AUTH_system
 Containers: 5
Objects: 4
  Bytes: 47804968
 Accept-Ranges: bytes
 X-Timestamp: 1351294912.72119
 lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1 
 swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
 myfiles1 /home/lab/bigfile1
 home/lab/bigfile1
 lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles2 /home/lab/bigfile1 
 swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
 myfiles2 /home/lab/bigfile1
 home/lab/bigfile1
 lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles3 /home/lab/bigfile1 
 swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
 myfiles3 /home/lab/bigfile1
 home/lab/bigfile1
 lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles4 /home/lab/bigfile1 
 swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
 myfiles4 /home/lab/bigfile1
 home/lab/bigfile1
 lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles5 /home/lab/bigfile1 
 swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
 myfiles5 /home/lab/bigfile1
 Object PUT failed: 
 http://172.16.1.111:8080/v1/AUTH_system/myfiles5/home/lab/bigfile1 503 
 Service Unavailable  [first 60 chars of response] 503 Service Unavailable
 
 The server is currently unavailable
 lab@proxy01:~/bin$ ./swiftcl.sh stat
 swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
Account: AUTH_system
 Containers: 6
Objects: 5
  Bytes: 59756210
 Accept-Ranges: bytes
 X-Timestamp: 1351294912.72119
 
 Here's the corresponding log on the Proxy:
 
 Oct 26 17:06:52 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/06/52 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
 Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0017
 Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
 Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
 Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
 Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
 Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
 Oct 26 17:07:27 proxy01 proxy-server Handoff requested (1) (txn: 
 tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)
 Oct 26 17:07:27 proxy01 proxy-server Handoff requested (2) (txn: 
 tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)
 Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
 Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
 Oct 26 17:07:33 proxy01 proxy-server Handoff requested (1) (txn: 
 tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111)
 Oct 26 17:07:33 proxy01 proxy-server Handoff requested (2) (txn: 
 tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111)
 Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0009
 Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GET 
 /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0009
 Oct 26 

[Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-26 Thread Nathan Trueblood
Hey folks-

I'm trying to figure out what's going wrong with my Swift deployment on a
small cluster of mini servers.   I have a small test cluster (5 storage
nodes, 1 proxy) of mini-servers that are ARM-based.   The proxy is a
regular, Intel-based server with plenty of RAM.   The
object/account/container servers are relatively small, with 2GB of RAM per
node.

Everything starts up fine, but now I'm trying to troubleshoot a strange
problem.   After I successfully upload a few test files, it seems like the
storage system stops responding and the proxy gives me a 503 error.

Here's the test sequence I run on my proxy:

lab@proxy01:~/bin$ ./swiftcl.sh stat

swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat

   Account: AUTH_system

Containers: 5

   Objects: 4

 Bytes: 47804968

Accept-Ranges: bytes

X-Timestamp: 1351294912.72119

lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1

swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload
myfiles1 /home/lab/bigfile1

home/lab/bigfile1

lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles2 /home/lab/bigfile1

swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload
myfiles2 /home/lab/bigfile1

home/lab/bigfile1

lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles3 /home/lab/bigfile1

swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload
myfiles3 /home/lab/bigfile1

home/lab/bigfile1

lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles4 /home/lab/bigfile1

swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload
myfiles4 /home/lab/bigfile1

home/lab/bigfile1

lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles5 /home/lab/bigfile1

swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload
myfiles5 /home/lab/bigfile1

Object PUT failed:
http://172.16.1.111:8080/v1/AUTH_system/myfiles5/home/lab/bigfile1 503
Service Unavailable  [first 60 chars of response] 503 Service Unavailable


The server is currently unavailable

lab@proxy01:~/bin$ ./swiftcl.sh stat

swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat

   Account: AUTH_system

Containers: 6

   Objects: 5

 Bytes: 59756210

Accept-Ranges: bytes

X-Timestamp: 1351294912.72119

Here's the corresponding log on the Proxy:

Oct 26 17:06:52 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/06/52 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010

Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0017

Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016

Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010

Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016

Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010

Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016

Oct 26 17:07:27 proxy01 proxy-server Handoff requested (1) (txn:
tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)

Oct 26 17:07:27 proxy01 proxy-server Handoff requested (2) (txn:
tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)

Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010

Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016

Oct 26 17:07:33 proxy01 proxy-server Handoff requested (1) (txn:
tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111)

Oct 26 17:07:33 proxy01 proxy-server Handoff requested (2) (txn:
tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111)

Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0009

Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0009

Oct 26 17:07:39 proxy01 proxy-server Handoff requested (1) (txn:
tx8dc917a4a8c84c40a4429b7bab0323c6) (client_ip: 172.16.1.111)

Oct 26 17:07:39 proxy01 proxy-server Handoff requested (2) (txn:
tx8dc917a4a8c84c40a4429b7bab0323c6) (client_ip: 172.16.1.111)

Oct 26 17:07:40 proxy01 proxy-server Object PUT returning 503, 1/2 required
connections (txn: tx8dc917a4a8c84c40a4429b7bab0323c6) (client_ip:
172.16.1.111)

Oct 26 17:07:41 proxy01 proxy-server Object PUT returning 503, 1/2 required
connections (txn: tx07a1f5dfaa23445a88eaa4a2ade68466) (client_ip:
172.16.1.111)

Oct 26 17:07:43 proxy01 proxy-server Object PUT returning 503, 1/2 required
connections (txn: tx938d08b706844db3886695b798bd9fad) (client_ip:
172.16.1.111)

Oct 26 17:07:47 proxy01 proxy-server Object PUT returning 503, 1/2 required
connections (txn: txa35e9f8a54924f139e13d6f3a5dc457f) 

Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-26 Thread Pete Zaitcev
On Fri, 26 Oct 2012 17:26:07 -0700
Nathan Trueblood nat...@truebloodllc.com wrote:

 I'm trying to figure out what's going wrong with my Swift deployment on a
 small cluster of mini servers.   I have a small test cluster (5 storage
 nodes, 1 proxy) of mini-servers that are ARM-based.   The proxy is a
 regular, Intel-based server with plenty of RAM.   The
 object/account/container servers are relatively small, with 2GB of RAM per
 node.

And the disk is how big?

 Oct 26 17:07:46 data05 object-server 192.168.1.111 - -
 [27/Oct/2012:00:07:46 +] PUT
 /sda6/150861/AUTH_system/myfiles5/home/lab/bigfile1 507 - -
 tx8dc917a4a8c84c40a4429b7bab0323c6 - 0.0031

Well, what does df say?

 The Object-servers do give a 507 error, which might indicate a disk
 problem, but there is nothing wrong with the storage drive.   And also if
 there was a fundamental drive problem then I wouldn't be able to upload
 objects in the first place.

You could upload them to a reduced number of nodes, and then the
replication would inflate the space used by the replication ratio.

Finally, it's possible that tombstones are not properly expired for
some reason.

-- Pete

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp