I have been doing some further digging into this and have found
information which leads me to believe that replication is not working as
it should...
In this cluster, we have 1 account which holds the majority of data,
for the sake of this example, this account is 41677 - it holds 34TB of
data.
Looking at the accounts sqlite db for this account on all nodes, I
notice the incoming_sync and outgoing_sync have remote_id entries which
I cannot locate anywhere:
sqlite> select * from incoming_sync;
remote_id sync_point updated_at
------------------------------------ ---------- ----------
9332d177-1034-44e9-b77e-961a7ee7da6d 308256694 1406830765
d87e4dea-1c42-4f3f-8462-76227acc7c32 301384851 1406830765
0b84aac5-d16e-4d76-9903-eb9122c19119 310265599 1406836822
As you can see above, those are the nodes the "incoming" replication is
expected from - however those ID's are not present on any other node
with the same account. Hence the amount of data reported on some nodes
is less than 34TB.
Why would this be? What can I do to fix this to ensure replication
resumes correctly?
Thanks,
Pritpal
On 2014-08-05 13:06, [email protected] wrote:
Hi All,
We are running Swift 1.4.8 with 8 nodes and 4 zones.
We recently added 4 SSD drives to one each to 4 of our storage nodes.
The accounts and container rings were then rebalanced to ensure this
data doesn't sit on spinning disks. Since the rebalance was done, we
have noticed something unusual in the statistics returned from within
swift.
This is the command being run to grab the statistics:
swift -v -A https://127.0.0.1:8080/auth/v1.0 -U <USERNAME> -K <PASS>
stat
Before the changes, the statistics looked like this:
===
Wed, 30 Jul 2014 10:51:26 +0100
Array
(
[X-Account-Object-Count] => 81473735
[X-Account-Bytes-Used] => 34156718530011
[X-Account-Container-Count] => 6510
)
Wed, 30 Jul 2014 10:51:36 +0100
Array
(
[X-Account-Object-Count] => 81473735
[X-Account-Bytes-Used] => 34156718530011
[X-Account-Container-Count] => 6510
)
Wed, 30 Jul 2014 10:51:46 +0100
Array
(
[X-Account-Object-Count] => 81698252
[X-Account-Bytes-Used] => 34213134745373
[X-Account-Container-Count] => 6510
)
Wed, 30 Jul 2014 10:51:56 +0100
Array
(
[X-Account-Object-Count] => 81687266
[X-Account-Bytes-Used] => 34209086906883
[X-Account-Container-Count] => 6510
)
Wed, 30 Jul 2014 10:52:06 +0100
Array
(
[X-Account-Object-Count] => 81687418
[X-Account-Bytes-Used] => 34209165517185
[X-Account-Container-Count] => 6510
)
Wed, 30 Jul 2014 10:52:16 +0100
Array
(
[X-Account-Object-Count] => 81405109
[X-Account-Bytes-Used] => 34105818678331
[X-Account-Container-Count] => 6510
)
Wed, 30 Jul 2014 10:52:26 +0100
Array
(
[X-Account-Object-Count] => 81460103
[X-Account-Bytes-Used] => 34127360552723
[X-Account-Container-Count] => 6510
)
===
Since the rebalancing, the statistics seem to show that
X-Account-Bytes-Used has dropped by around 7TB and
X-Account-Object-Count seems to have dropped to somewhere between 60M
- 70M objects. The statistics now seem to jump around wildly, as can
be seen below.
===
Tue, 05 Aug 2014 12:32:49 +0100
Array
(
[X-Account-Object-Count] => 59242579
[X-Account-Bytes-Used] => 24304403925249
[X-Account-Container-Count] => 6603
)
Tue, 05 Aug 2014 12:32:59 +0100
Array
(
[X-Account-Object-Count] => 58817476
[X-Account-Bytes-Used] => 24167437130211
[X-Account-Container-Count] => 6603
)
Tue, 05 Aug 2014 12:33:09 +0100
Array
(
[X-Account-Object-Count] => 63760679
[X-Account-Bytes-Used] => 25828018327577
[X-Account-Container-Count] => 6603
)
Tue, 05 Aug 2014 12:33:19 +0100
Array
(
[X-Account-Object-Count] => 66724351
[X-Account-Bytes-Used] => 27197208718607
[X-Account-Container-Count] => 6603
)
Tue, 05 Aug 2014 12:33:29 +0100
Array
(
[X-Account-Object-Count] => 67222017
[X-Account-Bytes-Used] => 27465314723569
[X-Account-Container-Count] => 6603
)
Tue, 05 Aug 2014 12:33:39 +0100
Array
(
[X-Account-Object-Count] => 67214198
[X-Account-Bytes-Used] => 27536268561101
[X-Account-Container-Count] => 6603
)
Tue, 05 Aug 2014 12:33:49 +0100
Array
(
[X-Account-Object-Count] => 68353884
[X-Account-Bytes-Used] => 28017869874871
[X-Account-Container-Count] => 6603
)
===
The above is repeated, the count increases, then drops back to down.
The question I have is, why would this happen? We definitely did not
delete anything, so as far as I am concerned data was just moved
around.
You can see the behaviour on these graphs -
http://www.preeto.co.uk/SwiftStats.PNG - Note how prior to the change
(2014-07-31), the totalbytes and totalobjects graphs are fairly
static.
Regards,
Pritpal
_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : [email protected]
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack