Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-30 Thread Nathan Trueblood
Ok, if I was giving out t-shirts for finding this issue then the prize
would go to Pete.   Thank you

Disabling fallocate did the trick.   I was slowly working my way through
all the object-server config options and hadn't gotten to that one yet.
Turning features on and off by brute force is admittedly lame, but
sometimes that's all you have.

I also turned off all the other things I was doing to try to slow down the
mini-servers, but disabling fallocate was all that was necessary.   Here is
my config:

[DEFAULT]
bind_ip = 192.168.1.202
workers = 1
disable_fallocate = true

[pipeline:main]
pipeline = object-server

[app:object-server]
use = egg:swift#object

[object-replicator]

[object-updater]

[object-auditor]

A few more details...

My servers are running Ubuntu 12.04 LTS.   A straight-up apt-get of all the
pre-requisites did NOT produce a working Swift deployment on Arm.
Although  all the dependencies would deploy fine and the Swift services
would start up, the proxy-server could not communicate with the storage
nodes.

So I also had to get older, Armel versions of the python-greenlet and
python-eventlet.

https://launchpad.net/ubuntu/precise/armel/python-greenlet/0.3.1-1ubuntu5.1
https://launchpad.net/ubuntu/precise/armel/python-eventlet/0.9.16-1ubuntu4.1

Once I deployed those older libraries for Armel, then my Swift cluster
worked (except for the fallocate issue).

Thanks for everyone's help.

-N

On Tue, Oct 30, 2012 at 11:07 AM, Nathan Trueblood
wrote:

> The filesystem is XFS, and I used the recommended mkfs and mount options
> for Swift.
>
> The file size seems to have no bearing on the issue, although I haven't
> tried really tiny files.   Bigfile3 is only 200K.
>
> I'll try disabling fallocate...
>
>
> On Mon, Oct 29, 2012 at 7:37 PM, Pete Zaitcev  wrote:
>
>> On Mon, 29 Oct 2012 18:16:52 -0700
>> Nathan Trueblood  wrote:
>>
>> > Definitely NOT a problem with the filesystem, but something is causing
>> the
>> > object-server to think there is a problem with the filesystem.
>>
>> If you are willing to go all-out, you can probably catch the
>> error with strace, if it works on ARM. Failing that, find all places
>> where 507 is generated and see if any exceptions are caught, by
>> modifying the source, I'm afraid to say.
>>
>> > I suspect a bug in one of the underlying libraries.
>>
>> That's a possibility. Or, it could be a kernel bug. You are using XFS,
>> right? If it were something other than XFS or ext4, I would suspect
>> ARM blowing over the 2GB barrier somewhere, since your object is
>> called "bigfile3". As it is, you have little option than to divide
>> the layers until you identify the one that's broken.
>>
>> BTW, make sure to disable the fallocate, since we're at it.
>>
>> -- Pete
>>
>
>
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-30 Thread Pete Zaitcev
On Tue, 30 Oct 2012 11:07:55 -0700
Nathan Trueblood  wrote:

> The file size seems to have no bearing on the issue, although I haven't
> tried really tiny files.   Bigfile3 is only 200K.

Okay. BTW, do not forget to use curl and issue the same PUT that proxy does,
see if it throws 507 repeateably. That could shortcut some of the testing.

-- Pete

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp



Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-30 Thread Nathan Trueblood
The filesystem is XFS, and I used the recommended mkfs and mount options
for Swift.

The file size seems to have no bearing on the issue, although I haven't
tried really tiny files.   Bigfile3 is only 200K.

I'll try disabling fallocate...

On Mon, Oct 29, 2012 at 7:37 PM, Pete Zaitcev  wrote:

> On Mon, 29 Oct 2012 18:16:52 -0700
> Nathan Trueblood  wrote:
>
> > Definitely NOT a problem with the filesystem, but something is causing
> the
> > object-server to think there is a problem with the filesystem.
>
> If you are willing to go all-out, you can probably catch the
> error with strace, if it works on ARM. Failing that, find all places
> where 507 is generated and see if any exceptions are caught, by
> modifying the source, I'm afraid to say.
>
> > I suspect a bug in one of the underlying libraries.
>
> That's a possibility. Or, it could be a kernel bug. You are using XFS,
> right? If it were something other than XFS or ext4, I would suspect
> ARM blowing over the 2GB barrier somewhere, since your object is
> called "bigfile3". As it is, you have little option than to divide
> the layers until you identify the one that's broken.
>
> BTW, make sure to disable the fallocate, since we're at it.
>
> -- Pete
>
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-30 Thread Nathan Trueblood
No disk errors in the kern.log.   The filesystem is fine.   I really think
this will turn out to be a bug or a timing (slowness) issue.

I will try some of the other recent suggestions, and failing those try to
track this down with strace.

Thx.

On Mon, Oct 29, 2012 at 7:02 PM, Alex Yang  wrote:

> There are any error about disk in the kern.log?
>
>
> 2012/10/30 Nathan Trueblood 
>
>> Still no further clues.   I re-created all the volumes I'm using for
>> Swift.  Plenty of Inodes free:
>>
>>  lab@data02:~$ df -i
>> FilesystemInodes IUsed IFree IUse% Mounted on
>> /dev/sda2   12214272 39290  121749821% /
>> none  107979   4821074971% /dev
>> none  107979   2681077111% /run
>> none  107979 21079771% /run/lock
>> none  107979 11079781% /run/shm
>> /dev/sda1  4915223 491291% /boot
>> /dev/sda4  13404640037 1340463631% /srv/node/sda4
>>
>> I successfully upload a small object to container cont1, then cont2.
>> When I upload to cont3, I see the following in the object-server log
>> (data02)
>>
>> This seems to be the problematic sequence:
>>
>> Data02 has ip 192.168.1.202
>> Data03 has ip 192.168.1.203
>>
>> 1. First the account server reports an HTTP 201 on the container from a
>> different object server in a different zone.
>> 2. Then the object server reports a 404 trying to HEAD the new object.
>> 3. Then the object server reports a 507 trying to PUT the new object.
>>
>> From this point the operation eventually fails and the proxy reports a
>> 503.
>>
>> Oct 29 17:58:20 data02 account-server 192.168.1.203 - -
>> [30/Oct/2012:00:58:20 +] "PUT /sda4/116021/AUTH_system/cont3" 201 -
>> "tx5a3ca6c845af41928e0ba6b7bc58d2da" "-" "-" 0.0082 ""
>> Oct 29 17:58:20 data02 object-server 192.168.1.111 - -
>> [30/Oct/2012:00:58:20 +] "HEAD
>> /sda4/257613/AUTH_system/cont3/home/lab/bigfile3" 404 - "-"
>> "tx5f21503ff12e45e39a80eb52f6757261" "-" 0.0011
>> Oct 29 17:58:20 data02 object-server 192.168.1.111 - -
>> [30/Oct/2012:00:58:20 +] "PUT
>> /sda4/257613/AUTH_system/cont3/home/lab/bigfile3" 507 - "-"
>> "tx425494dc372740e28d043a07d3a08b9a" "-" 0.0031
>>
>> In an earlier, successful transaction I noticed that between Steps 1 and
>> 2 above, there is a response from the container-server:
>>
>> Oct 29 17:57:59 data02 account-server 192.168.1.204 - -
>> [30/Oct/2012:00:57:59 +] "PUT /sda4/116021/AUTH_system/cont2" 201 -
>> "txb10d75886bf14e4eba14fcc52d81c5d9" "-" "-" 0.0182 ""
>> Oct 29 17:57:59 data02 container-server 192.168.1.111 - -
>> [30/Oct/2012:00:57:59 +] "PUT /sda4/122355/AUTH_system/cont2" 201 -
>> "txb10d75886bf14e4eba14fcc52d81c5d9" "-" "-" 0.1554
>> Oct 29 17:57:59 data02 object-server 192.168.1.111 - -
>> [30/Oct/2012:00:57:59 +] "HEAD
>> /sda4/226151/AUTH_system/cont2/home/lab/bigfile3" 404 - "-"
>> "tx1c514850530849d1bfbfa716d9039b87" "-" 0.0012
>> Oct 29 17:57:59 data02 container-server 192.168.1.204 - -
>> [30/Oct/2012:00:57:59 +] "PUT
>> /sda4/122355/AUTH_system/cont2/home/lab/bigfile3" 201 -
>> "tx8130af5cae484e5f9c5a25541d1c87aa" "-" "-" 0.0041
>> Oct 29 17:57:59 data02 object-server 192.168.1.111 - -
>> [30/Oct/2012:00:57:59 +] "PUT
>> /sda4/226151/AUTH_system/cont2/home/lab/bigfile3" 201 - "-"
>> "tx8130af5cae484e5f9c5a25541d1c87aa" "-" 0.1716
>>
>>
>> So maybe the container server is failing to create the new container?
>> Maybe a bug in auto-create of containers?
>>
>> Definitely NOT a problem with the filesystem, but something is causing
>> the object-server to think there is a problem with the filesystem.
>>
>> I suspect a bug in one of the underlying libraries.
>>
>> Any further suggestions on how to troubleshoot?
>>
>> Thanks.   When I finally find the solution, I'll post my results.
>>
>> -N
>>
>> On Fri, Oct 26, 2012 at 11:21 PM, John Dickinson  wrote:
>>
>>> A 507 is returned by the object servers in 2 situations: 1) the drives
>>> are full or 2) the drives have been unmounted because of disk error.
>>>
>>> It's highly likely that you simply have full drives. Remember that the
>>> usable space in your cluster is 1/N where N = replica count. As an example,
>>> with 3 replicas and 5 nodes with a single 1TB drive each, you only have
>>> about 1.6TB available for data.
>>>
>>> As Pete suggested in his response, how big are your drives, and what
>>> does `df` tell you?
>>>
>>> --John
>>>
>>>
>>> On Oct 26, 2012, at 5:26 PM, Nathan Trueblood 
>>> wrote:
>>>
>>> > Hey folks-
>>> >
>>> > I'm trying to figure out what's going wrong with my Swift deployment
>>> on a small cluster of "mini" servers.   I have a small test cluster (5
>>> storage nodes, 1 proxy) of mini-servers that are ARM-based.   The proxy is
>>> a regular, Intel-based server with plenty of RAM.   The
>>> object/account/container servers are relatively small, with 2GB of RAM per
>>> node.
>>> >
>>> > Everything starts up fine, but now I'm trying to troubleshoot a
>

Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-30 Thread Rick Jones

On 10/29/2012 07:37 PM, Pete Zaitcev wrote:

On Mon, 29 Oct 2012 18:16:52 -0700
Nathan Trueblood  wrote:


Definitely NOT a problem with the filesystem, but something is causing the
object-server to think there is a problem with the filesystem.


If you are willing to go all-out, you can probably catch the
error with strace, if it works on ARM.


Strace is your friend even if he is sometimes a bit on the chatty side. 
 It looks as though there is at least some support for ARM if 
http://packages.debian.org/search?keywords=strace is any indication.


rick jones


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-29 Thread Pete Zaitcev
On Mon, 29 Oct 2012 18:16:52 -0700
Nathan Trueblood  wrote:

> Definitely NOT a problem with the filesystem, but something is causing the
> object-server to think there is a problem with the filesystem.

If you are willing to go all-out, you can probably catch the
error with strace, if it works on ARM. Failing that, find all places
where 507 is generated and see if any exceptions are caught, by
modifying the source, I'm afraid to say.

> I suspect a bug in one of the underlying libraries.

That's a possibility. Or, it could be a kernel bug. You are using XFS,
right? If it were something other than XFS or ext4, I would suspect
ARM blowing over the 2GB barrier somewhere, since your object is
called "bigfile3". As it is, you have little option than to divide
the layers until you identify the one that's broken.

BTW, make sure to disable the fallocate, since we're at it.

-- Pete

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-29 Thread Alex Yang
There are any error about disk in the kern.log?

2012/10/30 Nathan Trueblood 

> Still no further clues.   I re-created all the volumes I'm using for
> Swift.  Plenty of Inodes free:
>
> lab@data02:~$ df -i
> FilesystemInodes IUsed IFree IUse% Mounted on
> /dev/sda2   12214272 39290  121749821% /
> none  107979   4821074971% /dev
> none  107979   2681077111% /run
> none  107979 21079771% /run/lock
> none  107979 11079781% /run/shm
> /dev/sda1  4915223 491291% /boot
> /dev/sda4  13404640037 1340463631% /srv/node/sda4
>
> I successfully upload a small object to container cont1, then cont2.
> When I upload to cont3, I see the following in the object-server log
> (data02)
>
> This seems to be the problematic sequence:
>
> Data02 has ip 192.168.1.202
> Data03 has ip 192.168.1.203
>
> 1. First the account server reports an HTTP 201 on the container from a
> different object server in a different zone.
> 2. Then the object server reports a 404 trying to HEAD the new object.
> 3. Then the object server reports a 507 trying to PUT the new object.
>
> From this point the operation eventually fails and the proxy reports a 503.
>
> Oct 29 17:58:20 data02 account-server 192.168.1.203 - -
> [30/Oct/2012:00:58:20 +] "PUT /sda4/116021/AUTH_system/cont3" 201 -
> "tx5a3ca6c845af41928e0ba6b7bc58d2da" "-" "-" 0.0082 ""
> Oct 29 17:58:20 data02 object-server 192.168.1.111 - -
> [30/Oct/2012:00:58:20 +] "HEAD
> /sda4/257613/AUTH_system/cont3/home/lab/bigfile3" 404 - "-"
> "tx5f21503ff12e45e39a80eb52f6757261" "-" 0.0011
> Oct 29 17:58:20 data02 object-server 192.168.1.111 - -
> [30/Oct/2012:00:58:20 +] "PUT
> /sda4/257613/AUTH_system/cont3/home/lab/bigfile3" 507 - "-"
> "tx425494dc372740e28d043a07d3a08b9a" "-" 0.0031
>
> In an earlier, successful transaction I noticed that between Steps 1 and 2
> above, there is a response from the container-server:
>
> Oct 29 17:57:59 data02 account-server 192.168.1.204 - -
> [30/Oct/2012:00:57:59 +] "PUT /sda4/116021/AUTH_system/cont2" 201 -
> "txb10d75886bf14e4eba14fcc52d81c5d9" "-" "-" 0.0182 ""
> Oct 29 17:57:59 data02 container-server 192.168.1.111 - -
> [30/Oct/2012:00:57:59 +] "PUT /sda4/122355/AUTH_system/cont2" 201 -
> "txb10d75886bf14e4eba14fcc52d81c5d9" "-" "-" 0.1554
> Oct 29 17:57:59 data02 object-server 192.168.1.111 - -
> [30/Oct/2012:00:57:59 +] "HEAD
> /sda4/226151/AUTH_system/cont2/home/lab/bigfile3" 404 - "-"
> "tx1c514850530849d1bfbfa716d9039b87" "-" 0.0012
> Oct 29 17:57:59 data02 container-server 192.168.1.204 - -
> [30/Oct/2012:00:57:59 +] "PUT
> /sda4/122355/AUTH_system/cont2/home/lab/bigfile3" 201 -
> "tx8130af5cae484e5f9c5a25541d1c87aa" "-" "-" 0.0041
> Oct 29 17:57:59 data02 object-server 192.168.1.111 - -
> [30/Oct/2012:00:57:59 +] "PUT
> /sda4/226151/AUTH_system/cont2/home/lab/bigfile3" 201 - "-"
> "tx8130af5cae484e5f9c5a25541d1c87aa" "-" 0.1716
>
>
> So maybe the container server is failing to create the new container?
> Maybe a bug in auto-create of containers?
>
> Definitely NOT a problem with the filesystem, but something is causing the
> object-server to think there is a problem with the filesystem.
>
> I suspect a bug in one of the underlying libraries.
>
> Any further suggestions on how to troubleshoot?
>
> Thanks.   When I finally find the solution, I'll post my results.
>
> -N
>
> On Fri, Oct 26, 2012 at 11:21 PM, John Dickinson  wrote:
>
>> A 507 is returned by the object servers in 2 situations: 1) the drives
>> are full or 2) the drives have been unmounted because of disk error.
>>
>> It's highly likely that you simply have full drives. Remember that the
>> usable space in your cluster is 1/N where N = replica count. As an example,
>> with 3 replicas and 5 nodes with a single 1TB drive each, you only have
>> about 1.6TB available for data.
>>
>> As Pete suggested in his response, how big are your drives, and what does
>> `df` tell you?
>>
>> --John
>>
>>
>> On Oct 26, 2012, at 5:26 PM, Nathan Trueblood 
>> wrote:
>>
>> > Hey folks-
>> >
>> > I'm trying to figure out what's going wrong with my Swift deployment on
>> a small cluster of "mini" servers.   I have a small test cluster (5 storage
>> nodes, 1 proxy) of mini-servers that are ARM-based.   The proxy is a
>> regular, Intel-based server with plenty of RAM.   The
>> object/account/container servers are relatively small, with 2GB of RAM per
>> node.
>> >
>> > Everything starts up fine, but now I'm trying to troubleshoot a strange
>> problem.   After I successfully upload a few test files, it seems like the
>> storage system stops responding and the proxy gives me a 503 error.
>> >
>> > Here's the test sequence I run on my proxy:
>> >
>> > lab@proxy01:~/bin$ ./swiftcl.sh stat
>> > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
>> >Account: AUTH_system
>> > Containers: 5
>> >Objects: 4
>> >  Bytes: 4

Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-29 Thread Nathan Trueblood
Still no further clues.   I re-created all the volumes I'm using for Swift.
 Plenty of Inodes free:

lab@data02:~$ df -i
FilesystemInodes IUsed IFree IUse% Mounted on
/dev/sda2   12214272 39290  121749821% /
none  107979   4821074971% /dev
none  107979   2681077111% /run
none  107979 21079771% /run/lock
none  107979 11079781% /run/shm
/dev/sda1  4915223 491291% /boot
/dev/sda4  13404640037 1340463631% /srv/node/sda4

I successfully upload a small object to container cont1, then cont2.   When
I upload to cont3, I see the following in the object-server log (data02)

This seems to be the problematic sequence:

Data02 has ip 192.168.1.202
Data03 has ip 192.168.1.203

1. First the account server reports an HTTP 201 on the container from a
different object server in a different zone.
2. Then the object server reports a 404 trying to HEAD the new object.
3. Then the object server reports a 507 trying to PUT the new object.

>From this point the operation eventually fails and the proxy reports a 503.

Oct 29 17:58:20 data02 account-server 192.168.1.203 - -
[30/Oct/2012:00:58:20 +] "PUT /sda4/116021/AUTH_system/cont3" 201 -
"tx5a3ca6c845af41928e0ba6b7bc58d2da" "-" "-" 0.0082 ""
Oct 29 17:58:20 data02 object-server 192.168.1.111 - -
[30/Oct/2012:00:58:20 +] "HEAD
/sda4/257613/AUTH_system/cont3/home/lab/bigfile3" 404 - "-"
"tx5f21503ff12e45e39a80eb52f6757261" "-" 0.0011
Oct 29 17:58:20 data02 object-server 192.168.1.111 - -
[30/Oct/2012:00:58:20 +] "PUT
/sda4/257613/AUTH_system/cont3/home/lab/bigfile3" 507 - "-"
"tx425494dc372740e28d043a07d3a08b9a" "-" 0.0031

In an earlier, successful transaction I noticed that between Steps 1 and 2
above, there is a response from the container-server:

Oct 29 17:57:59 data02 account-server 192.168.1.204 - -
[30/Oct/2012:00:57:59 +] "PUT /sda4/116021/AUTH_system/cont2" 201 -
"txb10d75886bf14e4eba14fcc52d81c5d9" "-" "-" 0.0182 ""
Oct 29 17:57:59 data02 container-server 192.168.1.111 - -
[30/Oct/2012:00:57:59 +] "PUT /sda4/122355/AUTH_system/cont2" 201 -
"txb10d75886bf14e4eba14fcc52d81c5d9" "-" "-" 0.1554
Oct 29 17:57:59 data02 object-server 192.168.1.111 - -
[30/Oct/2012:00:57:59 +] "HEAD
/sda4/226151/AUTH_system/cont2/home/lab/bigfile3" 404 - "-"
"tx1c514850530849d1bfbfa716d9039b87" "-" 0.0012
Oct 29 17:57:59 data02 container-server 192.168.1.204 - -
[30/Oct/2012:00:57:59 +] "PUT
/sda4/122355/AUTH_system/cont2/home/lab/bigfile3" 201 -
"tx8130af5cae484e5f9c5a25541d1c87aa" "-" "-" 0.0041
Oct 29 17:57:59 data02 object-server 192.168.1.111 - -
[30/Oct/2012:00:57:59 +] "PUT
/sda4/226151/AUTH_system/cont2/home/lab/bigfile3" 201 - "-"
"tx8130af5cae484e5f9c5a25541d1c87aa" "-" 0.1716


So maybe the container server is failing to create the new container?
Maybe a bug in auto-create of containers?

Definitely NOT a problem with the filesystem, but something is causing the
object-server to think there is a problem with the filesystem.

I suspect a bug in one of the underlying libraries.

Any further suggestions on how to troubleshoot?

Thanks.   When I finally find the solution, I'll post my results.

-N

On Fri, Oct 26, 2012 at 11:21 PM, John Dickinson  wrote:

> A 507 is returned by the object servers in 2 situations: 1) the drives are
> full or 2) the drives have been unmounted because of disk error.
>
> It's highly likely that you simply have full drives. Remember that the
> usable space in your cluster is 1/N where N = replica count. As an example,
> with 3 replicas and 5 nodes with a single 1TB drive each, you only have
> about 1.6TB available for data.
>
> As Pete suggested in his response, how big are your drives, and what does
> `df` tell you?
>
> --John
>
>
> On Oct 26, 2012, at 5:26 PM, Nathan Trueblood 
> wrote:
>
> > Hey folks-
> >
> > I'm trying to figure out what's going wrong with my Swift deployment on
> a small cluster of "mini" servers.   I have a small test cluster (5 storage
> nodes, 1 proxy) of mini-servers that are ARM-based.   The proxy is a
> regular, Intel-based server with plenty of RAM.   The
> object/account/container servers are relatively small, with 2GB of RAM per
> node.
> >
> > Everything starts up fine, but now I'm trying to troubleshoot a strange
> problem.   After I successfully upload a few test files, it seems like the
> storage system stops responding and the proxy gives me a 503 error.
> >
> > Here's the test sequence I run on my proxy:
> >
> > lab@proxy01:~/bin$ ./swiftcl.sh stat
> > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
> >Account: AUTH_system
> > Containers: 5
> >Objects: 4
> >  Bytes: 47804968
> > Accept-Ranges: bytes
> > X-Timestamp: 1351294912.72119
> > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1
> > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass
> upload myfiles1 /home/lab/bigfile1
> > home/la

Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-29 Thread John Dickinson
Also check the number of inodes used: `df -i`

--John



On Oct 29, 2012, at 8:31 AM, Nathan Trueblood  wrote:

> Yeah, I read about the 507 error.However, when the error occurs on my I 
> can see with 'df' that the drive is only 1% full and is definitely not 
> unmounted.   I can write files to the mounted filesystem directly before, 
> during, and after the Swift error occurs.   So the problem must be some kind 
> of timeout that is causing the object server to think that something is wrong 
> with the disk.
> 
> I'll keep digging... 
> 
> On Fri, Oct 26, 2012 at 11:21 PM, John Dickinson  wrote:
> A 507 is returned by the object servers in 2 situations: 1) the drives are 
> full or 2) the drives have been unmounted because of disk error.
> 
> It's highly likely that you simply have full drives. Remember that the usable 
> space in your cluster is 1/N where N = replica count. As an example, with 3 
> replicas and 5 nodes with a single 1TB drive each, you only have about 1.6TB 
> available for data.
> 
> As Pete suggested in his response, how big are your drives, and what does 
> `df` tell you?
> 
> --John
> 
> 
> On Oct 26, 2012, at 5:26 PM, Nathan Trueblood  wrote:
> 
> > Hey folks-
> >
> > I'm trying to figure out what's going wrong with my Swift deployment on a 
> > small cluster of "mini" servers.   I have a small test cluster (5 storage 
> > nodes, 1 proxy) of mini-servers that are ARM-based.   The proxy is a 
> > regular, Intel-based server with plenty of RAM.   The 
> > object/account/container servers are relatively small, with 2GB of RAM per 
> > node.
> >
> > Everything starts up fine, but now I'm trying to troubleshoot a strange 
> > problem.   After I successfully upload a few test files, it seems like the 
> > storage system stops responding and the proxy gives me a 503 error.
> >
> > Here's the test sequence I run on my proxy:
> >
> > lab@proxy01:~/bin$ ./swiftcl.sh stat
> > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
> >Account: AUTH_system
> > Containers: 5
> >Objects: 4
> >  Bytes: 47804968
> > Accept-Ranges: bytes
> > X-Timestamp: 1351294912.72119
> > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1
> > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
> > myfiles1 /home/lab/bigfile1
> > home/lab/bigfile1
> > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles2 /home/lab/bigfile1
> > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
> > myfiles2 /home/lab/bigfile1
> > home/lab/bigfile1
> > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles3 /home/lab/bigfile1
> > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
> > myfiles3 /home/lab/bigfile1
> > home/lab/bigfile1
> > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles4 /home/lab/bigfile1
> > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
> > myfiles4 /home/lab/bigfile1
> > home/lab/bigfile1
> > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles5 /home/lab/bigfile1
> > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
> > myfiles5 /home/lab/bigfile1
> > Object PUT failed: 
> > http://172.16.1.111:8080/v1/AUTH_system/myfiles5/home/lab/bigfile1 503 
> > Service Unavailable  [first 60 chars of response] 503 Service Unavailable
> >
> > The server is currently unavailable
> > lab@proxy01:~/bin$ ./swiftcl.sh stat
> > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
> >Account: AUTH_system
> > Containers: 6
> >Objects: 5
> >  Bytes: 59756210
> > Accept-Ranges: bytes
> > X-Timestamp: 1351294912.72119
> >
> > Here's the corresponding log on the Proxy:
> >
> > Oct 26 17:06:52 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/06/52 GET 
> > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
> > Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET 
> > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0017
> > Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET 
> > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
> > Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET 
> > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
> > Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET 
> > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
> > Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET 
> > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
> > Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET 
> > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
> > Oct 26 17:07:27 proxy01 proxy-server Handoff requested (1) (txn: 
> > tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)
> > Oct 26 17:07:27 proxy01 proxy-server Handoff requested (2) (txn: 
> > tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)
> > Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET 
> > /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010

Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-29 Thread Nathan Trueblood
Yeah, I read about the 507 error.However, when the error occurs on my I
can see with 'df' that the drive is only 1% full and is definitely not
unmounted.   I can write files to the mounted filesystem directly before,
during, and after the Swift error occurs.   So the problem must be some
kind of timeout that is causing the object server to think that something
is wrong with the disk.

I'll keep digging...

On Fri, Oct 26, 2012 at 11:21 PM, John Dickinson  wrote:

> A 507 is returned by the object servers in 2 situations: 1) the drives are
> full or 2) the drives have been unmounted because of disk error.
>
> It's highly likely that you simply have full drives. Remember that the
> usable space in your cluster is 1/N where N = replica count. As an example,
> with 3 replicas and 5 nodes with a single 1TB drive each, you only have
> about 1.6TB available for data.
>
> As Pete suggested in his response, how big are your drives, and what does
> `df` tell you?
>
> --John
>
>
> On Oct 26, 2012, at 5:26 PM, Nathan Trueblood 
> wrote:
>
> > Hey folks-
> >
> > I'm trying to figure out what's going wrong with my Swift deployment on
> a small cluster of "mini" servers.   I have a small test cluster (5 storage
> nodes, 1 proxy) of mini-servers that are ARM-based.   The proxy is a
> regular, Intel-based server with plenty of RAM.   The
> object/account/container servers are relatively small, with 2GB of RAM per
> node.
> >
> > Everything starts up fine, but now I'm trying to troubleshoot a strange
> problem.   After I successfully upload a few test files, it seems like the
> storage system stops responding and the proxy gives me a 503 error.
> >
> > Here's the test sequence I run on my proxy:
> >
> > lab@proxy01:~/bin$ ./swiftcl.sh stat
> > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
> >Account: AUTH_system
> > Containers: 5
> >Objects: 4
> >  Bytes: 47804968
> > Accept-Ranges: bytes
> > X-Timestamp: 1351294912.72119
> > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1
> > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass
> upload myfiles1 /home/lab/bigfile1
> > home/lab/bigfile1
> > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles2 /home/lab/bigfile1
> > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass
> upload myfiles2 /home/lab/bigfile1
> > home/lab/bigfile1
> > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles3 /home/lab/bigfile1
> > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass
> upload myfiles3 /home/lab/bigfile1
> > home/lab/bigfile1
> > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles4 /home/lab/bigfile1
> > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass
> upload myfiles4 /home/lab/bigfile1
> > home/lab/bigfile1
> > lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles5 /home/lab/bigfile1
> > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass
> upload myfiles5 /home/lab/bigfile1
> > Object PUT failed:
> http://172.16.1.111:8080/v1/AUTH_system/myfiles5/home/lab/bigfile1 503
> Service Unavailable  [first 60 chars of response] 503 Service Unavailable
> >
> > The server is currently unavailable
> > lab@proxy01:~/bin$ ./swiftcl.sh stat
> > swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
> >Account: AUTH_system
> > Containers: 6
> >Objects: 5
> >  Bytes: 59756210
> > Accept-Ranges: bytes
> > X-Timestamp: 1351294912.72119
> >
> > Here's the corresponding log on the Proxy:
> >
> > Oct 26 17:06:52 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/06/52
> GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
> > Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13
> GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0017
> > Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13
> GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
> > Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22
> GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
> > Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22
> GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
> > Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27
> GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
> > Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27
> GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
> > Oct 26 17:07:27 proxy01 proxy-server Handoff requested (1) (txn:
> tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)
> > Oct 26 17:07:27 proxy01 proxy-server Handoff requested (2) (txn:
> tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)
> > Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33
> GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
> > Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33
> GET /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
> > Oct 26 17:07:33 proxy01 proxy-server Handoff requested (1) (txn:
> tx5f9659f74cb2491f

Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-27 Thread Diego Parrilla
Sorry for my off topic question, it's the first time I heard of swift storage 
servers running an ARM processor. 

I think ARM architectures can be an interesting alternative to replace classic 
computing servers, but swift storage nodes did not catch my attention as a 
valid alternative, until now.

From my perspective, the savings in power consumption and real state in your 
datacenter are minimized if you load your servers with two dozen SATA disks (a 
typical swift node configuration) and a dual 10GbE connection, for example.

May be there are benefits I'm not aware of, but I would really love to hear 
about them :-)

Cheers
Diego

Enviado desde mi iPhone, perdona la brevedad

El 27/10/2012, a las 02:26, Nathan Trueblood  escribió:

> Hey folks-
> 
> I'm trying to figure out what's going wrong with my Swift deployment on a 
> small cluster of "mini" servers.   I have a small test cluster (5 storage 
> nodes, 1 proxy) of mini-servers that are ARM-based.   The proxy is a regular, 
> Intel-based server with plenty of RAM.   The object/account/container servers 
> are relatively small, with 2GB of RAM per node.
> 
> Everything starts up fine, but now I'm trying to troubleshoot a strange 
> problem.   After I successfully upload a few test files, it seems like the 
> storage system stops responding and the proxy gives me a 503 error.
> 
> Here's the test sequence I run on my proxy:
> 
> lab@proxy01:~/bin$ ./swiftcl.sh stat
> swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
>Account: AUTH_system
> Containers: 5
>Objects: 4
>  Bytes: 47804968
> Accept-Ranges: bytes
> X-Timestamp: 1351294912.72119
> lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1 
> swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
> myfiles1 /home/lab/bigfile1
> home/lab/bigfile1
> lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles2 /home/lab/bigfile1 
> swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
> myfiles2 /home/lab/bigfile1
> home/lab/bigfile1
> lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles3 /home/lab/bigfile1 
> swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
> myfiles3 /home/lab/bigfile1
> home/lab/bigfile1
> lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles4 /home/lab/bigfile1 
> swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
> myfiles4 /home/lab/bigfile1
> home/lab/bigfile1
> lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles5 /home/lab/bigfile1 
> swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
> myfiles5 /home/lab/bigfile1
> Object PUT failed: 
> http://172.16.1.111:8080/v1/AUTH_system/myfiles5/home/lab/bigfile1 503 
> Service Unavailable  [first 60 chars of response] 503 Service Unavailable
> 
> The server is currently unavailable
> lab@proxy01:~/bin$ ./swiftcl.sh stat
> swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
>Account: AUTH_system
> Containers: 6
>Objects: 5
>  Bytes: 59756210
> Accept-Ranges: bytes
> X-Timestamp: 1351294912.72119
> 
> Here's the corresponding log on the Proxy:
> 
> Oct 26 17:06:52 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/06/52 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
> Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0017
> Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
> Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
> Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
> Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
> Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
> Oct 26 17:07:27 proxy01 proxy-server Handoff requested (1) (txn: 
> tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)
> Oct 26 17:07:27 proxy01 proxy-server Handoff requested (2) (txn: 
> tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)
> Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
> Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
> Oct 26 17:07:33 proxy01 proxy-server Handoff requested (1) (txn: 
> tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111)
> Oct 26 17:07:33 proxy01 proxy-server Handoff requested (2) (txn: 
> tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111)
> Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0009
> Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GE

Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-26 Thread John Dickinson
A 507 is returned by the object servers in 2 situations: 1) the drives are full 
or 2) the drives have been unmounted because of disk error.

It's highly likely that you simply have full drives. Remember that the usable 
space in your cluster is 1/N where N = replica count. As an example, with 3 
replicas and 5 nodes with a single 1TB drive each, you only have about 1.6TB 
available for data.

As Pete suggested in his response, how big are your drives, and what does `df` 
tell you?

--John


On Oct 26, 2012, at 5:26 PM, Nathan Trueblood  wrote:

> Hey folks-
> 
> I'm trying to figure out what's going wrong with my Swift deployment on a 
> small cluster of "mini" servers.   I have a small test cluster (5 storage 
> nodes, 1 proxy) of mini-servers that are ARM-based.   The proxy is a regular, 
> Intel-based server with plenty of RAM.   The object/account/container servers 
> are relatively small, with 2GB of RAM per node.
> 
> Everything starts up fine, but now I'm trying to troubleshoot a strange 
> problem.   After I successfully upload a few test files, it seems like the 
> storage system stops responding and the proxy gives me a 503 error.
> 
> Here's the test sequence I run on my proxy:
> 
> lab@proxy01:~/bin$ ./swiftcl.sh stat
> swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
>Account: AUTH_system
> Containers: 5
>Objects: 4
>  Bytes: 47804968
> Accept-Ranges: bytes
> X-Timestamp: 1351294912.72119
> lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1 
> swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
> myfiles1 /home/lab/bigfile1
> home/lab/bigfile1
> lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles2 /home/lab/bigfile1 
> swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
> myfiles2 /home/lab/bigfile1
> home/lab/bigfile1
> lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles3 /home/lab/bigfile1 
> swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
> myfiles3 /home/lab/bigfile1
> home/lab/bigfile1
> lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles4 /home/lab/bigfile1 
> swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
> myfiles4 /home/lab/bigfile1
> home/lab/bigfile1
> lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles5 /home/lab/bigfile1 
> swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload 
> myfiles5 /home/lab/bigfile1
> Object PUT failed: 
> http://172.16.1.111:8080/v1/AUTH_system/myfiles5/home/lab/bigfile1 503 
> Service Unavailable  [first 60 chars of response] 503 Service Unavailable
> 
> The server is currently unavailable
> lab@proxy01:~/bin$ ./swiftcl.sh stat
> swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat
>Account: AUTH_system
> Containers: 6
>Objects: 5
>  Bytes: 59756210
> Accept-Ranges: bytes
> X-Timestamp: 1351294912.72119
> 
> Here's the corresponding log on the Proxy:
> 
> Oct 26 17:06:52 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/06/52 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
> Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0017
> Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
> Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
> Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
> Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
> Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
> Oct 26 17:07:27 proxy01 proxy-server Handoff requested (1) (txn: 
> tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)
> Oct 26 17:07:27 proxy01 proxy-server Handoff requested (2) (txn: 
> tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)
> Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010
> Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016
> Oct 26 17:07:33 proxy01 proxy-server Handoff requested (1) (txn: 
> tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111)
> Oct 26 17:07:33 proxy01 proxy-server Handoff requested (2) (txn: 
> tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111)
> Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0009
> Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GET 
> /auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0009
> Oct 26 17:07:39 proxy01 proxy-server Handoff requested (1) (txn: 
> tx8dc917a4a8c84c40a4429b7bab0323c6) (client_ip: 172.16.1.111)
> Oct 26 1

Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-26 Thread Pete Zaitcev
On Fri, 26 Oct 2012 17:26:07 -0700
Nathan Trueblood  wrote:

> I'm trying to figure out what's going wrong with my Swift deployment on a
> small cluster of "mini" servers.   I have a small test cluster (5 storage
> nodes, 1 proxy) of mini-servers that are ARM-based.   The proxy is a
> regular, Intel-based server with plenty of RAM.   The
> object/account/container servers are relatively small, with 2GB of RAM per
> node.

And the disk is how big?

> Oct 26 17:07:46 data05 object-server 192.168.1.111 - -
> [27/Oct/2012:00:07:46 +] "PUT
> /sda6/150861/AUTH_system/myfiles5/home/lab/bigfile1" 507 - "-"
> "tx8dc917a4a8c84c40a4429b7bab0323c6" "-" 0.0031

Well, what does df say?

> The Object-servers do give a 507 error, which might indicate a disk
> problem, but there is nothing wrong with the storage drive.   And also if
> there was a fundamental drive problem then I wouldn't be able to upload
> objects in the first place.

You could upload them to a reduced number of nodes, and then the
replication would inflate the space used by the replication ratio.

Finally, it's possible that tombstones are not properly expired for
some reason.

-- Pete

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-26 Thread Nathan Trueblood
Hey folks-

I'm trying to figure out what's going wrong with my Swift deployment on a
small cluster of "mini" servers.   I have a small test cluster (5 storage
nodes, 1 proxy) of mini-servers that are ARM-based.   The proxy is a
regular, Intel-based server with plenty of RAM.   The
object/account/container servers are relatively small, with 2GB of RAM per
node.

Everything starts up fine, but now I'm trying to troubleshoot a strange
problem.   After I successfully upload a few test files, it seems like the
storage system stops responding and the proxy gives me a 503 error.

Here's the test sequence I run on my proxy:

lab@proxy01:~/bin$ ./swiftcl.sh stat

swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat

   Account: AUTH_system

Containers: 5

   Objects: 4

 Bytes: 47804968

Accept-Ranges: bytes

X-Timestamp: 1351294912.72119

lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles1 /home/lab/bigfile1

swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload
myfiles1 /home/lab/bigfile1

home/lab/bigfile1

lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles2 /home/lab/bigfile1

swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload
myfiles2 /home/lab/bigfile1

home/lab/bigfile1

lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles3 /home/lab/bigfile1

swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload
myfiles3 /home/lab/bigfile1

home/lab/bigfile1

lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles4 /home/lab/bigfile1

swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload
myfiles4 /home/lab/bigfile1

home/lab/bigfile1

lab@proxy01:~/bin$ ./swiftcl.sh upload myfiles5 /home/lab/bigfile1

swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass upload
myfiles5 /home/lab/bigfile1

Object PUT failed:
http://172.16.1.111:8080/v1/AUTH_system/myfiles5/home/lab/bigfile1 503
Service Unavailable  [first 60 chars of response] 503 Service Unavailable


The server is currently unavailable

lab@proxy01:~/bin$ ./swiftcl.sh stat

swift -A http://proxy01:8080/auth/v1.0 -U system:root -K testpass stat

   Account: AUTH_system

Containers: 6

   Objects: 5

 Bytes: 59756210

Accept-Ranges: bytes

X-Timestamp: 1351294912.72119

Here's the corresponding log on the Proxy:

Oct 26 17:06:52 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/06/52 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010

Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0017

Oct 26 17:07:13 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/13 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016

Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010

Oct 26 17:07:22 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/22 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016

Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010

Oct 26 17:07:27 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/27 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016

Oct 26 17:07:27 proxy01 proxy-server Handoff requested (1) (txn:
tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)

Oct 26 17:07:27 proxy01 proxy-server Handoff requested (2) (txn:
tx6946419daba54efe9c2878f8a2a78f88) (client_ip: 172.16.1.111)

Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0010

Oct 26 17:07:33 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/33 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0016

Oct 26 17:07:33 proxy01 proxy-server Handoff requested (1) (txn:
tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111)

Oct 26 17:07:33 proxy01 proxy-server Handoff requested (2) (txn:
tx5f9659f74cb2491f9a63cbb84f680c5c) (client_ip: 172.16.1.111)

Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0009

Oct 26 17:07:39 proxy01 proxy-server - 127.0.0.1 27/Oct/2012/00/07/39 GET
/auth/v1.0/ HTTP/1.0 200 - - - - - - - - 0.0009

Oct 26 17:07:39 proxy01 proxy-server Handoff requested (1) (txn:
tx8dc917a4a8c84c40a4429b7bab0323c6) (client_ip: 172.16.1.111)

Oct 26 17:07:39 proxy01 proxy-server Handoff requested (2) (txn:
tx8dc917a4a8c84c40a4429b7bab0323c6) (client_ip: 172.16.1.111)

Oct 26 17:07:40 proxy01 proxy-server Object PUT returning 503, 1/2 required
connections (txn: tx8dc917a4a8c84c40a4429b7bab0323c6) (client_ip:
172.16.1.111)

Oct 26 17:07:41 proxy01 proxy-server Object PUT returning 503, 1/2 required
connections (txn: tx07a1f5dfaa23445a88eaa4a2ade68466) (client_ip:
172.16.1.111)

Oct 26 17:07:43 proxy01 proxy-server Object PUT returning 503, 1/2 required
connections (txn: tx938d08b706844db3886695b798bd9fad) (client_ip:
172.16.1.111)

Oct 26 17:07:47 proxy01 proxy-server Object PUT returning 503, 1/2 required
connections (txn: txa35e9f8a54924f139e13d6f3a5dc457f