Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Yehuda Sadeh
On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote:

 On 29/11/14 11:40, Yehuda Sadeh wrote:

 On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote:

 On 29/11/14 01:50, Yehuda Sadeh wrote:

 On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote:

 On 2014-11-28 15:42, Yehuda Sadeh wrote:

 On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:

 On 2014-11-27 11:36, Yehuda Sadeh wrote:


 On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:


 On 2014-11-27 10:21, Yehuda Sadeh wrote:



 On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote:



 On 2014-11-27 09:38, Yehuda Sadeh wrote:




 On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote:




 I've been deleting a bucket which originally had 60TB of data
 in
 it,
 with
 our cluster doing only 1 replication, the total usage was
 120TB.

 I've been deleting the objects slowly using S3 browser, and I
 can
 see
 the
 bucket usage is now down to around 2.5TB or 5TB with
 duplication,
 but
 the
 usage in the cluster has not changed.

 I've looked at garbage collection (radosgw-admin gc list
 --include
 all)
 and
 it just reports square brackets []

 I've run radosgw-admin temp remove --date=2014-11-20, and it
 doesn't
 appear
 to have any effect.

 Is there a way to check where this space is being consumed?

 Running 'ceph df' the USED space in the buckets pool is not
 showing
 any
 of
 the 57TB that should have been freed up from the deletion so
 far.

 Running 'radosgw-admin bucket stats | jshon | grep
 size_kb_actual'
 and
 adding up all the buckets usage, this shows that the space has
 been
 freed
 from the bucket, but the cluster is all sorts of messed up.


 ANY IDEAS? What can I look at?





 Can you run 'radosgw-admin gc list --include-all'?

 Yehuda





 I've done it before, and it just returns square brackets [] (see
 below)

 radosgw-admin gc list --include-all
 []




 Do you know which of the rados pools have all that extra data? Try
 to
 list that pool's objects, verify that there are no surprises there
 (e.g., use 'rados -p pool ls').

 Yehuda




 I'm just running that command now, and its taking some time. There
 is
 a
 large number of objects.

 Once it has finished, what should I be looking for?



 I assume the pool in question is the one that holds your objects
 data?
 You should be looking for objects that are not expected to exist
 anymore, and objects of buckets that don't exist anymore. The
 problem
 here is to identify these.
 I suggest starting by looking at all the existing buckets, compose a
 list of all the bucket prefixes for the existing buckets, and try to
 look whether there are objects that have different prefixes.

 Yehuda



 Any ideas? I've found the prefix, the number of objects in the pool
 that
 match that prefix numbers in the 21 millions
 The actual 'radosgw-admin bucket stats' command reports it as only
 having
 1.2 million.


 Well, the objects you're seeing are raw objects, and since rgw stripes
 the data, it is expected to have more raw objects than objects in the
 bucket. Still, it seems that you have much too many of these. You can
 try to check whether there are pending multipart uploads that were
 never completed using the S3 api.
 At the moment there's no easy way to figure out which raw objects are
 not supposed to exist. The process would be like this:
 1. rados ls -p data pool
 keep the list sorted
 2. list objects in the bucket
 3. for each object in (2), do: radosgw-admin object stat
 --bucket=bucket --object=object --rgw-cache-enabled=false
 (disabling the cache so that it goes quicker)
 4. look at the result of (3), and generate a list of all the parts.
 5. sort result of (4), compare it to (1)

 Note that if you're running firefly or later, the raw objects are not
 specified explicitly in the command you run at (3), so you might need
 a different procedure, e.g., find out the raw objects random string
 that is being used, remove it from the list generated in 1, etc.)

 That's basically it.
 I'll be interested to figure out what happened, why the garbage
 collection didn't work correctly. You could try verifying that it's
 working by:
- create an object (let's say ~10MB in size).
- radosgw-admin object stat --bucket=bucket --object=object
  (keep this info, see
- remove the object
- run radosgw-admin gc list --include-all and verify that the raw
 parts are listed there
- wait a few hours, repeat last step, see that the parts don't
 appear
 there anymore
- run rados -p pool ls, check to see if the raw objects still
 exist

 Yehuda

 Not sure where to go from here, and our cluster is slowly filling up
 while
 not clearing any space.



 I did the last section:

 I'll be interested to figure out what happened, why the garbage
 collection didn't work correctly. You could try verifying that it's
 working by:
- create an object (let's say ~10MB in size).
- radosgw-admin object stat --bucket=bucket 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Ben

On 2014-12-02 08:39, Yehuda Sadeh wrote:

On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote:


On 29/11/14 11:40, Yehuda Sadeh wrote:


On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote:


On 29/11/14 01:50, Yehuda Sadeh wrote:


On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote:


On 2014-11-28 15:42, Yehuda Sadeh wrote:


On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:


On 2014-11-27 11:36, Yehuda Sadeh wrote:



On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:



On 2014-11-27 10:21, Yehuda Sadeh wrote:




On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email 
wrote:




On 2014-11-27 09:38, Yehuda Sadeh wrote:





On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email 
wrote:





I've been deleting a bucket which originally had 60TB of 
data

in
it,
with
our cluster doing only 1 replication, the total usage was
120TB.

I've been deleting the objects slowly using S3 browser, 
and I

can
see
the
bucket usage is now down to around 2.5TB or 5TB with
duplication,
but
the
usage in the cluster has not changed.

I've looked at garbage collection (radosgw-admin gc list
--include
all)
and
it just reports square brackets []

I've run radosgw-admin temp remove --date=2014-11-20, and 
it

doesn't
appear
to have any effect.

Is there a way to check where this space is being 
consumed?


Running 'ceph df' the USED space in the buckets pool is 
not

showing
any
of
the 57TB that should have been freed up from the deletion 
so

far.

Running 'radosgw-admin bucket stats | jshon | grep
size_kb_actual'
and
adding up all the buckets usage, this shows that the space 
has

been
freed
from the bucket, but the cluster is all sorts of messed 
up.



ANY IDEAS? What can I look at?






Can you run 'radosgw-admin gc list --include-all'?

Yehuda






I've done it before, and it just returns square brackets [] 
(see

below)

radosgw-admin gc list --include-all
[]





Do you know which of the rados pools have all that extra 
data? Try

to
list that pool's objects, verify that there are no surprises 
there

(e.g., use 'rados -p pool ls').

Yehuda





I'm just running that command now, and its taking some time. 
There

is
a
large number of objects.

Once it has finished, what should I be looking for?




I assume the pool in question is the one that holds your 
objects

data?
You should be looking for objects that are not expected to 
exist

anymore, and objects of buckets that don't exist anymore. The
problem
here is to identify these.
I suggest starting by looking at all the existing buckets, 
compose a
list of all the bucket prefixes for the existing buckets, and 
try to

look whether there are objects that have different prefixes.

Yehuda




Any ideas? I've found the prefix, the number of objects in the 
pool

that
match that prefix numbers in the 21 millions
The actual 'radosgw-admin bucket stats' command reports it as 
only

having
1.2 million.



Well, the objects you're seeing are raw objects, and since rgw 
stripes
the data, it is expected to have more raw objects than objects in 
the
bucket. Still, it seems that you have much too many of these. You 
can
try to check whether there are pending multipart uploads that 
were

never completed using the S3 api.
At the moment there's no easy way to figure out which raw objects 
are

not supposed to exist. The process would be like this:
1. rados ls -p data pool
keep the list sorted
2. list objects in the bucket
3. for each object in (2), do: radosgw-admin object stat
--bucket=bucket --object=object --rgw-cache-enabled=false
(disabling the cache so that it goes quicker)
4. look at the result of (3), and generate a list of all the 
parts.

5. sort result of (4), compare it to (1)

Note that if you're running firefly or later, the raw objects are 
not
specified explicitly in the command you run at (3), so you might 
need
a different procedure, e.g., find out the raw objects random 
string

that is being used, remove it from the list generated in 1, etc.)

That's basically it.
I'll be interested to figure out what happened, why the garbage
collection didn't work correctly. You could try verifying that 
it's

working by:
   - create an object (let's say ~10MB in size).
   - radosgw-admin object stat --bucket=bucket 
--object=object

 (keep this info, see
   - remove the object
   - run radosgw-admin gc list --include-all and verify that the 
raw

parts are listed there
   - wait a few hours, repeat last step, see that the parts don't
appear
there anymore
   - run rados -p pool ls, check to see if the raw objects 
still

exist

Yehuda

Not sure where to go from here, and our cluster is slowly 
filling up

while
not clearing any space.




I did the last section:


I'll be interested to figure out what happened, why the garbage
collection didn't work correctly. You could try verifying that 
it's

working by:
   - create an object (let's say ~10MB in size).
   - radosgw-admin object stat --bucket=bucket 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Yehuda Sadeh
On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote:
 On 2014-12-02 08:39, Yehuda Sadeh wrote:

 On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote:


 On 29/11/14 11:40, Yehuda Sadeh wrote:


 On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote:


 On 29/11/14 01:50, Yehuda Sadeh wrote:


 On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote:


 On 2014-11-28 15:42, Yehuda Sadeh wrote:


 On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:


 On 2014-11-27 11:36, Yehuda Sadeh wrote:



 On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:



 On 2014-11-27 10:21, Yehuda Sadeh wrote:




 On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote:




 On 2014-11-27 09:38, Yehuda Sadeh wrote:





 On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote:





 I've been deleting a bucket which originally had 60TB of data
 in
 it,
 with
 our cluster doing only 1 replication, the total usage was
 120TB.

 I've been deleting the objects slowly using S3 browser, and I
 can
 see
 the
 bucket usage is now down to around 2.5TB or 5TB with
 duplication,
 but
 the
 usage in the cluster has not changed.

 I've looked at garbage collection (radosgw-admin gc list
 --include
 all)
 and
 it just reports square brackets []

 I've run radosgw-admin temp remove --date=2014-11-20, and it
 doesn't
 appear
 to have any effect.

 Is there a way to check where this space is being consumed?

 Running 'ceph df' the USED space in the buckets pool is not
 showing
 any
 of
 the 57TB that should have been freed up from the deletion so
 far.

 Running 'radosgw-admin bucket stats | jshon | grep
 size_kb_actual'
 and
 adding up all the buckets usage, this shows that the space
 has
 been
 freed
 from the bucket, but the cluster is all sorts of messed up.


 ANY IDEAS? What can I look at?






 Can you run 'radosgw-admin gc list --include-all'?

 Yehuda






 I've done it before, and it just returns square brackets []
 (see
 below)

 radosgw-admin gc list --include-all
 []





 Do you know which of the rados pools have all that extra data?
 Try
 to
 list that pool's objects, verify that there are no surprises
 there
 (e.g., use 'rados -p pool ls').

 Yehuda





 I'm just running that command now, and its taking some time.
 There
 is
 a
 large number of objects.

 Once it has finished, what should I be looking for?




 I assume the pool in question is the one that holds your objects
 data?
 You should be looking for objects that are not expected to exist
 anymore, and objects of buckets that don't exist anymore. The
 problem
 here is to identify these.
 I suggest starting by looking at all the existing buckets, compose
 a
 list of all the bucket prefixes for the existing buckets, and try
 to
 look whether there are objects that have different prefixes.

 Yehuda




 Any ideas? I've found the prefix, the number of objects in the pool
 that
 match that prefix numbers in the 21 millions
 The actual 'radosgw-admin bucket stats' command reports it as only
 having
 1.2 million.



 Well, the objects you're seeing are raw objects, and since rgw
 stripes
 the data, it is expected to have more raw objects than objects in
 the
 bucket. Still, it seems that you have much too many of these. You
 can
 try to check whether there are pending multipart uploads that were
 never completed using the S3 api.
 At the moment there's no easy way to figure out which raw objects
 are
 not supposed to exist. The process would be like this:
 1. rados ls -p data pool
 keep the list sorted
 2. list objects in the bucket
 3. for each object in (2), do: radosgw-admin object stat
 --bucket=bucket --object=object --rgw-cache-enabled=false
 (disabling the cache so that it goes quicker)
 4. look at the result of (3), and generate a list of all the parts.
 5. sort result of (4), compare it to (1)

 Note that if you're running firefly or later, the raw objects are
 not
 specified explicitly in the command you run at (3), so you might
 need
 a different procedure, e.g., find out the raw objects random string
 that is being used, remove it from the list generated in 1, etc.)

 That's basically it.
 I'll be interested to figure out what happened, why the garbage
 collection didn't work correctly. You could try verifying that it's
 working by:
- create an object (let's say ~10MB in size).
- radosgw-admin object stat --bucket=bucket --object=object
  (keep this info, see
- remove the object
- run radosgw-admin gc list --include-all and verify that the raw
 parts are listed there
- wait a few hours, repeat last step, see that the parts don't
 appear
 there anymore
- run rados -p pool ls, check to see if the raw objects still
 exist

 Yehuda

 Not sure where to go from here, and our cluster is slowly filling
 up
 while
 not clearing any space.




 I did the last section:


 I'll be interested to figure out what happened, why the garbage
 collection didn't work correctly. You could try 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Ben

On 2014-12-02 09:25, Yehuda Sadeh wrote:

On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote:

On 2014-12-02 08:39, Yehuda Sadeh wrote:


On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote:



On 29/11/14 11:40, Yehuda Sadeh wrote:



On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote:



On 29/11/14 01:50, Yehuda Sadeh wrote:



On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote:



On 2014-11-28 15:42, Yehuda Sadeh wrote:



On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:



On 2014-11-27 11:36, Yehuda Sadeh wrote:




On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email 
wrote:




On 2014-11-27 10:21, Yehuda Sadeh wrote:





On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email 
wrote:





On 2014-11-27 09:38, Yehuda Sadeh wrote:






On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email 
wrote:






I've been deleting a bucket which originally had 60TB of 
data

in
it,
with
our cluster doing only 1 replication, the total usage 
was

120TB.

I've been deleting the objects slowly using S3 browser, 
and I

can
see
the
bucket usage is now down to around 2.5TB or 5TB with
duplication,
but
the
usage in the cluster has not changed.

I've looked at garbage collection (radosgw-admin gc list
--include
all)
and
it just reports square brackets []

I've run radosgw-admin temp remove --date=2014-11-20, 
and it

doesn't
appear
to have any effect.

Is there a way to check where this space is being 
consumed?


Running 'ceph df' the USED space in the buckets pool is 
not

showing
any
of
the 57TB that should have been freed up from the 
deletion so

far.

Running 'radosgw-admin bucket stats | jshon | grep
size_kb_actual'
and
adding up all the buckets usage, this shows that the 
space

has
been
freed
from the bucket, but the cluster is all sorts of messed 
up.



ANY IDEAS? What can I look at?







Can you run 'radosgw-admin gc list --include-all'?

Yehuda







I've done it before, and it just returns square brackets 
[]

(see
below)

radosgw-admin gc list --include-all
[]






Do you know which of the rados pools have all that extra 
data?

Try
to
list that pool's objects, verify that there are no 
surprises

there
(e.g., use 'rados -p pool ls').

Yehuda






I'm just running that command now, and its taking some time.
There
is
a
large number of objects.

Once it has finished, what should I be looking for?





I assume the pool in question is the one that holds your 
objects

data?
You should be looking for objects that are not expected to 
exist

anymore, and objects of buckets that don't exist anymore. The
problem
here is to identify these.
I suggest starting by looking at all the existing buckets, 
compose

a
list of all the bucket prefixes for the existing buckets, and 
try

to
look whether there are objects that have different prefixes.

Yehuda





Any ideas? I've found the prefix, the number of objects in the 
pool

that
match that prefix numbers in the 21 millions
The actual 'radosgw-admin bucket stats' command reports it as 
only

having
1.2 million.




Well, the objects you're seeing are raw objects, and since rgw
stripes
the data, it is expected to have more raw objects than objects 
in

the
bucket. Still, it seems that you have much too many of these. 
You

can
try to check whether there are pending multipart uploads that 
were

never completed using the S3 api.
At the moment there's no easy way to figure out which raw 
objects

are
not supposed to exist. The process would be like this:
1. rados ls -p data pool
keep the list sorted
2. list objects in the bucket
3. for each object in (2), do: radosgw-admin object stat
--bucket=bucket --object=object --rgw-cache-enabled=false
(disabling the cache so that it goes quicker)
4. look at the result of (3), and generate a list of all the 
parts.

5. sort result of (4), compare it to (1)

Note that if you're running firefly or later, the raw objects 
are

not
specified explicitly in the command you run at (3), so you 
might

need
a different procedure, e.g., find out the raw objects random 
string
that is being used, remove it from the list generated in 1, 
etc.)


That's basically it.
I'll be interested to figure out what happened, why the garbage
collection didn't work correctly. You could try verifying that 
it's

working by:
   - create an object (let's say ~10MB in size).
   - radosgw-admin object stat --bucket=bucket 
--object=object

 (keep this info, see
   - remove the object
   - run radosgw-admin gc list --include-all and verify that 
the raw

parts are listed there
   - wait a few hours, repeat last step, see that the parts 
don't

appear
there anymore
   - run rados -p pool ls, check to see if the raw objects 
still

exist

Yehuda

Not sure where to go from here, and our cluster is slowly 
filling

up
while
not clearing any space.





I did the last section:



I'll be interested to figure out what happened, why the garbage
collection didn't work correctly. You could try 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Yehuda Sadeh
On Mon, Dec 1, 2014 at 3:20 PM, Ben b@benjackson.email wrote:
 On 2014-12-02 09:25, Yehuda Sadeh wrote:

 On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote:

 On 2014-12-02 08:39, Yehuda Sadeh wrote:


 On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote:



 On 29/11/14 11:40, Yehuda Sadeh wrote:



 On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote:



 On 29/11/14 01:50, Yehuda Sadeh wrote:



 On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote:



 On 2014-11-28 15:42, Yehuda Sadeh wrote:



 On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:



 On 2014-11-27 11:36, Yehuda Sadeh wrote:




 On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:




 On 2014-11-27 10:21, Yehuda Sadeh wrote:





 On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote:





 On 2014-11-27 09:38, Yehuda Sadeh wrote:






 On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email
 wrote:






 I've been deleting a bucket which originally had 60TB of
 data
 in
 it,
 with
 our cluster doing only 1 replication, the total usage was
 120TB.

 I've been deleting the objects slowly using S3 browser, and
 I
 can
 see
 the
 bucket usage is now down to around 2.5TB or 5TB with
 duplication,
 but
 the
 usage in the cluster has not changed.

 I've looked at garbage collection (radosgw-admin gc list
 --include
 all)
 and
 it just reports square brackets []

 I've run radosgw-admin temp remove --date=2014-11-20, and
 it
 doesn't
 appear
 to have any effect.

 Is there a way to check where this space is being consumed?

 Running 'ceph df' the USED space in the buckets pool is not
 showing
 any
 of
 the 57TB that should have been freed up from the deletion
 so
 far.

 Running 'radosgw-admin bucket stats | jshon | grep
 size_kb_actual'
 and
 adding up all the buckets usage, this shows that the space
 has
 been
 freed
 from the bucket, but the cluster is all sorts of messed up.


 ANY IDEAS? What can I look at?







 Can you run 'radosgw-admin gc list --include-all'?

 Yehuda







 I've done it before, and it just returns square brackets []
 (see
 below)

 radosgw-admin gc list --include-all
 []






 Do you know which of the rados pools have all that extra data?
 Try
 to
 list that pool's objects, verify that there are no surprises
 there
 (e.g., use 'rados -p pool ls').

 Yehuda






 I'm just running that command now, and its taking some time.
 There
 is
 a
 large number of objects.

 Once it has finished, what should I be looking for?





 I assume the pool in question is the one that holds your objects
 data?
 You should be looking for objects that are not expected to exist
 anymore, and objects of buckets that don't exist anymore. The
 problem
 here is to identify these.
 I suggest starting by looking at all the existing buckets,
 compose
 a
 list of all the bucket prefixes for the existing buckets, and
 try
 to
 look whether there are objects that have different prefixes.

 Yehuda





 Any ideas? I've found the prefix, the number of objects in the
 pool
 that
 match that prefix numbers in the 21 millions
 The actual 'radosgw-admin bucket stats' command reports it as
 only
 having
 1.2 million.




 Well, the objects you're seeing are raw objects, and since rgw
 stripes
 the data, it is expected to have more raw objects than objects in
 the
 bucket. Still, it seems that you have much too many of these. You
 can
 try to check whether there are pending multipart uploads that were
 never completed using the S3 api.
 At the moment there's no easy way to figure out which raw objects
 are
 not supposed to exist. The process would be like this:
 1. rados ls -p data pool
 keep the list sorted
 2. list objects in the bucket
 3. for each object in (2), do: radosgw-admin object stat
 --bucket=bucket --object=object --rgw-cache-enabled=false
 (disabling the cache so that it goes quicker)
 4. look at the result of (3), and generate a list of all the
 parts.
 5. sort result of (4), compare it to (1)

 Note that if you're running firefly or later, the raw objects are
 not
 specified explicitly in the command you run at (3), so you might
 need
 a different procedure, e.g., find out the raw objects random
 string
 that is being used, remove it from the list generated in 1, etc.)

 That's basically it.
 I'll be interested to figure out what happened, why the garbage
 collection didn't work correctly. You could try verifying that
 it's
 working by:
- create an object (let's say ~10MB in size).
- radosgw-admin object stat --bucket=bucket --object=object
  (keep this info, see
- remove the object
- run radosgw-admin gc list --include-all and verify that the
 raw
 parts are listed there
- wait a few hours, repeat last step, see that the parts don't
 appear
 there anymore
- run rados -p pool ls, check to see if the raw objects still
 exist

 Yehuda

 Not sure where to go from here, and our cluster is slowly filling
 up
 while
 not clearing any space.





 I 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Ben

On 2014-12-02 11:21, Yehuda Sadeh wrote:

On Mon, Dec 1, 2014 at 3:47 PM, Ben b@benjackson.email wrote:

On 2014-12-02 10:40, Yehuda Sadeh wrote:


On Mon, Dec 1, 2014 at 3:20 PM, Ben b@benjackson.email wrote:


On 2014-12-02 09:25, Yehuda Sadeh wrote:



On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote:



On 2014-12-02 08:39, Yehuda Sadeh wrote:




On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote:





On 29/11/14 11:40, Yehuda Sadeh wrote:





On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email 
wrote:





On 29/11/14 01:50, Yehuda Sadeh wrote:





On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email 
wrote:





On 2014-11-28 15:42, Yehuda Sadeh wrote:





On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email 
wrote:





On 2014-11-27 11:36, Yehuda Sadeh wrote:






On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email 
wrote:






On 2014-11-27 10:21, Yehuda Sadeh wrote:







On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email
wrote:







On 2014-11-27 09:38, Yehuda Sadeh wrote:








On Wed, Nov 26, 2014 at 2:32 PM, b 
b@benjackson.email

wrote:








I've been deleting a bucket which originally had 
60TB of

data
in
it,
with
our cluster doing only 1 replication, the total 
usage was

120TB.

I've been deleting the objects slowly using S3 
browser,

and
I
can
see
the
bucket usage is now down to around 2.5TB or 5TB with
duplication,
but
the
usage in the cluster has not changed.

I've looked at garbage collection (radosgw-admin gc 
list

--include
all)
and
it just reports square brackets []

I've run radosgw-admin temp remove 
--date=2014-11-20, and

it
doesn't
appear
to have any effect.

Is there a way to check where this space is being
consumed?

Running 'ceph df' the USED space in the buckets pool 
is

not
showing
any
of
the 57TB that should have been freed up from the 
deletion

so
far.

Running 'radosgw-admin bucket stats | jshon | grep
size_kb_actual'
and
adding up all the buckets usage, this shows that the
space
has
been
freed
from the bucket, but the cluster is all sorts of 
messed

up.


ANY IDEAS? What can I look at?









Can you run 'radosgw-admin gc list --include-all'?

Yehuda









I've done it before, and it just returns square 
brackets []

(see
below)

radosgw-admin gc list --include-all
[]








Do you know which of the rados pools have all that 
extra

data?
Try
to
list that pool's objects, verify that there are no 
surprises

there
(e.g., use 'rados -p pool ls').

Yehuda








I'm just running that command now, and its taking some 
time.

There
is
a
large number of objects.

Once it has finished, what should I be looking for?







I assume the pool in question is the one that holds your
objects
data?
You should be looking for objects that are not expected 
to

exist
anymore, and objects of buckets that don't exist anymore. 
The

problem
here is to identify these.
I suggest starting by looking at all the existing 
buckets,

compose
a
list of all the bucket prefixes for the existing buckets, 
and

try
to
look whether there are objects that have different 
prefixes.


Yehuda







Any ideas? I've found the prefix, the number of objects in 
the

pool
that
match that prefix numbers in the 21 millions
The actual 'radosgw-admin bucket stats' command reports it 
as

only
having
1.2 million.






Well, the objects you're seeing are raw objects, and since 
rgw

stripes
the data, it is expected to have more raw objects than 
objects

in
the
bucket. Still, it seems that you have much too many of 
these.

You
can
try to check whether there are pending multipart uploads 
that

were
never completed using the S3 api.
At the moment there's no easy way to figure out which raw
objects
are
not supposed to exist. The process would be like this:
1. rados ls -p data pool
keep the list sorted
2. list objects in the bucket
3. for each object in (2), do: radosgw-admin object stat
--bucket=bucket --object=object 
--rgw-cache-enabled=false

(disabling the cache so that it goes quicker)
4. look at the result of (3), and generate a list of all 
the

parts.
5. sort result of (4), compare it to (1)

Note that if you're running firefly or later, the raw 
objects

are
not
specified explicitly in the command you run at (3), so you 
might

need
a different procedure, e.g., find out the raw objects 
random

string
that is being used, remove it from the list generated in 1,
etc.)

That's basically it.
I'll be interested to figure out what happened, why the 
garbage
collection didn't work correctly. You could try verifying 
that

it's
working by:
   - create an object (let's say ~10MB in size).
   - radosgw-admin object stat --bucket=bucket
--object=object
 (keep this info, see
   - remove the object
   - run radosgw-admin gc list --include-all and verify 
that the

raw
parts are listed there
   - wait a few hours, repeat last step, see that the parts
don't
appear
there anymore
   - run rados -p pool ls, check to see if the raw 
objects

still
exist


Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Yehuda Sadeh
On Mon, Dec 1, 2014 at 4:23 PM, Ben b@benjackson.email wrote:
 On 2014-12-02 11:21, Yehuda Sadeh wrote:

 On Mon, Dec 1, 2014 at 3:47 PM, Ben b@benjackson.email wrote:

 On 2014-12-02 10:40, Yehuda Sadeh wrote:


 On Mon, Dec 1, 2014 at 3:20 PM, Ben b@benjackson.email wrote:


 On 2014-12-02 09:25, Yehuda Sadeh wrote:



 On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote:



 On 2014-12-02 08:39, Yehuda Sadeh wrote:




 On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote:





 On 29/11/14 11:40, Yehuda Sadeh wrote:





 On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote:





 On 29/11/14 01:50, Yehuda Sadeh wrote:





 On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote:





 On 2014-11-28 15:42, Yehuda Sadeh wrote:





 On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:





 On 2014-11-27 11:36, Yehuda Sadeh wrote:






 On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email
 wrote:






 On 2014-11-27 10:21, Yehuda Sadeh wrote:







 On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email
 wrote:







 On 2014-11-27 09:38, Yehuda Sadeh wrote:








 On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email
 wrote:








 I've been deleting a bucket which originally had 60TB
 of
 data
 in
 it,
 with
 our cluster doing only 1 replication, the total usage
 was
 120TB.

 I've been deleting the objects slowly using S3 browser,
 and
 I
 can
 see
 the
 bucket usage is now down to around 2.5TB or 5TB with
 duplication,
 but
 the
 usage in the cluster has not changed.

 I've looked at garbage collection (radosgw-admin gc
 list
 --include
 all)
 and
 it just reports square brackets []

 I've run radosgw-admin temp remove --date=2014-11-20,
 and
 it
 doesn't
 appear
 to have any effect.

 Is there a way to check where this space is being
 consumed?

 Running 'ceph df' the USED space in the buckets pool is
 not
 showing
 any
 of
 the 57TB that should have been freed up from the
 deletion
 so
 far.

 Running 'radosgw-admin bucket stats | jshon | grep
 size_kb_actual'
 and
 adding up all the buckets usage, this shows that the
 space
 has
 been
 freed
 from the bucket, but the cluster is all sorts of messed
 up.


 ANY IDEAS? What can I look at?









 Can you run 'radosgw-admin gc list --include-all'?

 Yehuda









 I've done it before, and it just returns square brackets
 []
 (see
 below)

 radosgw-admin gc list --include-all
 []








 Do you know which of the rados pools have all that extra
 data?
 Try
 to
 list that pool's objects, verify that there are no
 surprises
 there
 (e.g., use 'rados -p pool ls').

 Yehuda








 I'm just running that command now, and its taking some
 time.
 There
 is
 a
 large number of objects.

 Once it has finished, what should I be looking for?







 I assume the pool in question is the one that holds your
 objects
 data?
 You should be looking for objects that are not expected to
 exist
 anymore, and objects of buckets that don't exist anymore.
 The
 problem
 here is to identify these.
 I suggest starting by looking at all the existing buckets,
 compose
 a
 list of all the bucket prefixes for the existing buckets,
 and
 try
 to
 look whether there are objects that have different prefixes.

 Yehuda







 Any ideas? I've found the prefix, the number of objects in
 the
 pool
 that
 match that prefix numbers in the 21 millions
 The actual 'radosgw-admin bucket stats' command reports it as
 only
 having
 1.2 million.






 Well, the objects you're seeing are raw objects, and since rgw
 stripes
 the data, it is expected to have more raw objects than objects
 in
 the
 bucket. Still, it seems that you have much too many of these.
 You
 can
 try to check whether there are pending multipart uploads that
 were
 never completed using the S3 api.
 At the moment there's no easy way to figure out which raw
 objects
 are
 not supposed to exist. The process would be like this:
 1. rados ls -p data pool
 keep the list sorted
 2. list objects in the bucket
 3. for each object in (2), do: radosgw-admin object stat
 --bucket=bucket --object=object --rgw-cache-enabled=false
 (disabling the cache so that it goes quicker)
 4. look at the result of (3), and generate a list of all the
 parts.
 5. sort result of (4), compare it to (1)

 Note that if you're running firefly or later, the raw objects
 are
 not
 specified explicitly in the command you run at (3), so you
 might
 need
 a different procedure, e.g., find out the raw objects random
 string
 that is being used, remove it from the list generated in 1,
 etc.)

 That's basically it.
 I'll be interested to figure out what happened, why the
 garbage
 collection didn't work correctly. You could try verifying that
 it's
 working by:
- create an object (let's say ~10MB in size).
- radosgw-admin object stat --bucket=bucket
 --object=object
  (keep this info, see
- remove the object
- run radosgw-admin gc list --include-all and verify that
 the
 raw
 parts are 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Ben

On 2014-12-02 11:25, Yehuda Sadeh wrote:

On Mon, Dec 1, 2014 at 4:23 PM, Ben b@benjackson.email wrote:

On 2014-12-02 11:21, Yehuda Sadeh wrote:


On Mon, Dec 1, 2014 at 3:47 PM, Ben b@benjackson.email wrote:


On 2014-12-02 10:40, Yehuda Sadeh wrote:



On Mon, Dec 1, 2014 at 3:20 PM, Ben b@benjackson.email wrote:



On 2014-12-02 09:25, Yehuda Sadeh wrote:




On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote:




On 2014-12-02 08:39, Yehuda Sadeh wrote:





On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email 
wrote:






On 29/11/14 11:40, Yehuda Sadeh wrote:






On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email 
wrote:






On 29/11/14 01:50, Yehuda Sadeh wrote:






On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email 
wrote:






On 2014-11-28 15:42, Yehuda Sadeh wrote:






On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email 
wrote:






On 2014-11-27 11:36, Yehuda Sadeh wrote:







On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email
wrote:







On 2014-11-27 10:21, Yehuda Sadeh wrote:








On Wed, Nov 26, 2014 at 3:09 PM, b 
b@benjackson.email

wrote:








On 2014-11-27 09:38, Yehuda Sadeh wrote:









On Wed, Nov 26, 2014 at 2:32 PM, b 
b@benjackson.email

wrote:









I've been deleting a bucket which originally had 
60TB

of
data
in
it,
with
our cluster doing only 1 replication, the total 
usage

was
120TB.

I've been deleting the objects slowly using S3 
browser,

and
I
can
see
the
bucket usage is now down to around 2.5TB or 5TB 
with

duplication,
but
the
usage in the cluster has not changed.

I've looked at garbage collection (radosgw-admin 
gc

list
--include
all)
and
it just reports square brackets []

I've run radosgw-admin temp remove 
--date=2014-11-20,

and
it
doesn't
appear
to have any effect.

Is there a way to check where this space is being
consumed?

Running 'ceph df' the USED space in the buckets 
pool is

not
showing
any
of
the 57TB that should have been freed up from the
deletion
so
far.

Running 'radosgw-admin bucket stats | jshon | grep
size_kb_actual'
and
adding up all the buckets usage, this shows that 
the

space
has
been
freed
from the bucket, but the cluster is all sorts of 
messed

up.


ANY IDEAS? What can I look at?










Can you run 'radosgw-admin gc list --include-all'?

Yehuda










I've done it before, and it just returns square 
brackets

[]
(see
below)

radosgw-admin gc list --include-all
[]









Do you know which of the rados pools have all that 
extra

data?
Try
to
list that pool's objects, verify that there are no
surprises
there
(e.g., use 'rados -p pool ls').

Yehuda









I'm just running that command now, and its taking some
time.
There
is
a
large number of objects.

Once it has finished, what should I be looking for?








I assume the pool in question is the one that holds 
your

objects
data?
You should be looking for objects that are not expected 
to

exist
anymore, and objects of buckets that don't exist 
anymore.

The
problem
here is to identify these.
I suggest starting by looking at all the existing 
buckets,

compose
a
list of all the bucket prefixes for the existing 
buckets,

and
try
to
look whether there are objects that have different 
prefixes.


Yehuda








Any ideas? I've found the prefix, the number of objects 
in

the
pool
that
match that prefix numbers in the 21 millions
The actual 'radosgw-admin bucket stats' command reports 
it as

only
having
1.2 million.







Well, the objects you're seeing are raw objects, and 
since rgw

stripes
the data, it is expected to have more raw objects than 
objects

in
the
bucket. Still, it seems that you have much too many of 
these.

You
can
try to check whether there are pending multipart uploads 
that

were
never completed using the S3 api.
At the moment there's no easy way to figure out which raw
objects
are
not supposed to exist. The process would be like this:
1. rados ls -p data pool
keep the list sorted
2. list objects in the bucket
3. for each object in (2), do: radosgw-admin object stat
--bucket=bucket --object=object 
--rgw-cache-enabled=false

(disabling the cache so that it goes quicker)
4. look at the result of (3), and generate a list of all 
the

parts.
5. sort result of (4), compare it to (1)

Note that if you're running firefly or later, the raw 
objects

are
not
specified explicitly in the command you run at (3), so 
you

might
need
a different procedure, e.g., find out the raw objects 
random

string
that is being used, remove it from the list generated in 
1,

etc.)

That's basically it.
I'll be interested to figure out what happened, why the
garbage
collection didn't work correctly. You could try verifying 
that

it's
working by:
   - create an object (let's say ~10MB in size).
   - radosgw-admin object stat --bucket=bucket
--object=object
 (keep this info, see
   - remove the object
   - run radosgw-admin gc list --include-all and verify 
that

the
raw
parts are listed there
   - wait a few hours, 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Yehuda Sadeh
On Mon, Dec 1, 2014 at 3:47 PM, Ben b@benjackson.email wrote:
 On 2014-12-02 10:40, Yehuda Sadeh wrote:

 On Mon, Dec 1, 2014 at 3:20 PM, Ben b@benjackson.email wrote:

 On 2014-12-02 09:25, Yehuda Sadeh wrote:


 On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote:


 On 2014-12-02 08:39, Yehuda Sadeh wrote:



 On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote:




 On 29/11/14 11:40, Yehuda Sadeh wrote:




 On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote:




 On 29/11/14 01:50, Yehuda Sadeh wrote:




 On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote:




 On 2014-11-28 15:42, Yehuda Sadeh wrote:




 On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:




 On 2014-11-27 11:36, Yehuda Sadeh wrote:





 On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:





 On 2014-11-27 10:21, Yehuda Sadeh wrote:






 On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email
 wrote:






 On 2014-11-27 09:38, Yehuda Sadeh wrote:







 On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email
 wrote:







 I've been deleting a bucket which originally had 60TB of
 data
 in
 it,
 with
 our cluster doing only 1 replication, the total usage was
 120TB.

 I've been deleting the objects slowly using S3 browser,
 and
 I
 can
 see
 the
 bucket usage is now down to around 2.5TB or 5TB with
 duplication,
 but
 the
 usage in the cluster has not changed.

 I've looked at garbage collection (radosgw-admin gc list
 --include
 all)
 and
 it just reports square brackets []

 I've run radosgw-admin temp remove --date=2014-11-20, and
 it
 doesn't
 appear
 to have any effect.

 Is there a way to check where this space is being
 consumed?

 Running 'ceph df' the USED space in the buckets pool is
 not
 showing
 any
 of
 the 57TB that should have been freed up from the deletion
 so
 far.

 Running 'radosgw-admin bucket stats | jshon | grep
 size_kb_actual'
 and
 adding up all the buckets usage, this shows that the
 space
 has
 been
 freed
 from the bucket, but the cluster is all sorts of messed
 up.


 ANY IDEAS? What can I look at?








 Can you run 'radosgw-admin gc list --include-all'?

 Yehuda








 I've done it before, and it just returns square brackets []
 (see
 below)

 radosgw-admin gc list --include-all
 []







 Do you know which of the rados pools have all that extra
 data?
 Try
 to
 list that pool's objects, verify that there are no surprises
 there
 (e.g., use 'rados -p pool ls').

 Yehuda







 I'm just running that command now, and its taking some time.
 There
 is
 a
 large number of objects.

 Once it has finished, what should I be looking for?






 I assume the pool in question is the one that holds your
 objects
 data?
 You should be looking for objects that are not expected to
 exist
 anymore, and objects of buckets that don't exist anymore. The
 problem
 here is to identify these.
 I suggest starting by looking at all the existing buckets,
 compose
 a
 list of all the bucket prefixes for the existing buckets, and
 try
 to
 look whether there are objects that have different prefixes.

 Yehuda






 Any ideas? I've found the prefix, the number of objects in the
 pool
 that
 match that prefix numbers in the 21 millions
 The actual 'radosgw-admin bucket stats' command reports it as
 only
 having
 1.2 million.





 Well, the objects you're seeing are raw objects, and since rgw
 stripes
 the data, it is expected to have more raw objects than objects
 in
 the
 bucket. Still, it seems that you have much too many of these.
 You
 can
 try to check whether there are pending multipart uploads that
 were
 never completed using the S3 api.
 At the moment there's no easy way to figure out which raw
 objects
 are
 not supposed to exist. The process would be like this:
 1. rados ls -p data pool
 keep the list sorted
 2. list objects in the bucket
 3. for each object in (2), do: radosgw-admin object stat
 --bucket=bucket --object=object --rgw-cache-enabled=false
 (disabling the cache so that it goes quicker)
 4. look at the result of (3), and generate a list of all the
 parts.
 5. sort result of (4), compare it to (1)

 Note that if you're running firefly or later, the raw objects
 are
 not
 specified explicitly in the command you run at (3), so you might
 need
 a different procedure, e.g., find out the raw objects random
 string
 that is being used, remove it from the list generated in 1,
 etc.)

 That's basically it.
 I'll be interested to figure out what happened, why the garbage
 collection didn't work correctly. You could try verifying that
 it's
 working by:
- create an object (let's say ~10MB in size).
- radosgw-admin object stat --bucket=bucket
 --object=object
  (keep this info, see
- remove the object
- run radosgw-admin gc list --include-all and verify that the
 raw
 parts are listed there
- wait a few hours, repeat last step, see that the parts
 don't
 appear
 there anymore
- run rados -p pool ls, check to see if 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Yehuda Sadeh
On Mon, Dec 1, 2014 at 4:26 PM, Ben b@benjackson.email wrote:
 On 2014-12-02 11:25, Yehuda Sadeh wrote:

 On Mon, Dec 1, 2014 at 4:23 PM, Ben b@benjackson.email wrote:

...

 How can I tell if the shard has an object in it from the logs?




 Search for a different sequence (e.g., search for rgw.gc_remove).

 Yehuda





 0 Results in the logs for rgw.gc_remove



 Well, something is modifying the gc log. Do you happen to have more
 than one radosgw running on the same cluster?

 Yehuda



 We have 2 radosgw servers
 obj01 and obj02


 Are both of them pointing at the same zone?


 Yes, they are load balanced

Well, the gc log show entries, and then it doesn't, so something
clears these up. Try reproducing again with logs on, see if you see
new entries in the rgw logs. If you don't see these, maybe try turning
on 'debug ms = 1' on your osds (ceph tell osd.* injectargs '--debug_ms
1'), and look in your osd logs for such messages. These might give you
some hint for their origin.
Also, could it be that you ran 'radosgw-admin gc process', instead of
waiting for the gc cycle to complete?

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Ben

On 2014-12-02 15:03, Yehuda Sadeh wrote:

On Mon, Dec 1, 2014 at 4:26 PM, Ben b@benjackson.email wrote:

On 2014-12-02 11:25, Yehuda Sadeh wrote:


On Mon, Dec 1, 2014 at 4:23 PM, Ben b@benjackson.email wrote:


...


How can I tell if the shard has an object in it from the logs?





Search for a different sequence (e.g., search for rgw.gc_remove).

Yehuda






0 Results in the logs for rgw.gc_remove




Well, something is modifying the gc log. Do you happen to have more
than one radosgw running on the same cluster?

Yehuda




We have 2 radosgw servers
obj01 and obj02



Are both of them pointing at the same zone?



Yes, they are load balanced


Well, the gc log show entries, and then it doesn't, so something
clears these up. Try reproducing again with logs on, see if you see
new entries in the rgw logs. If you don't see these, maybe try turning
on 'debug ms = 1' on your osds (ceph tell osd.* injectargs '--debug_ms
1'), and look in your osd logs for such messages. These might give you
some hint for their origin.
Also, could it be that you ran 'radosgw-admin gc process', instead of
waiting for the gc cycle to complete?

Yehuda


I did anohter test, this time with a 600mb file. I uploaded it, then 
deleted the file and did a gc list --include all.
It displayed around 143 _shadow_ files. I let GC process itself (I did 
not force this process) and I checked the pool afterward by running 
'rados ls -p .rgw.buckets | grep gc-listed-shadowfiles' and they no 
longer exist.


I've added the debug ms to the OSDs, I'll do another test with the 600mb 
file.


Also worth noting, I have started clearing out files from the 
.rgw.buckets pool that are from a bucket which has been deleted and no 
longer visible by running 'rados -p .rgw.gc rm' over all the _shadow_ 
files contained in that bucket prefix default.4804.14__shadow_

Is this alright to do, or is there a better way to clear out files?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-11-29 Thread Ben


On 29/11/14 11:40, Yehuda Sadeh wrote:

On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote:

On 29/11/14 01:50, Yehuda Sadeh wrote:

On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote:

On 2014-11-28 15:42, Yehuda Sadeh wrote:

On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:

On 2014-11-27 11:36, Yehuda Sadeh wrote:


On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:


On 2014-11-27 10:21, Yehuda Sadeh wrote:



On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote:



On 2014-11-27 09:38, Yehuda Sadeh wrote:




On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote:




I've been deleting a bucket which originally had 60TB of data in
it,
with
our cluster doing only 1 replication, the total usage was 120TB.

I've been deleting the objects slowly using S3 browser, and I can
see
the
bucket usage is now down to around 2.5TB or 5TB with duplication,
but
the
usage in the cluster has not changed.

I've looked at garbage collection (radosgw-admin gc list
--include
all)
and
it just reports square brackets []

I've run radosgw-admin temp remove --date=2014-11-20, and it
doesn't
appear
to have any effect.

Is there a way to check where this space is being consumed?

Running 'ceph df' the USED space in the buckets pool is not
showing
any
of
the 57TB that should have been freed up from the deletion so far.

Running 'radosgw-admin bucket stats | jshon | grep
size_kb_actual'
and
adding up all the buckets usage, this shows that the space has
been
freed
from the bucket, but the cluster is all sorts of messed up.


ANY IDEAS? What can I look at?





Can you run 'radosgw-admin gc list --include-all'?

Yehuda





I've done it before, and it just returns square brackets [] (see
below)

radosgw-admin gc list --include-all
[]




Do you know which of the rados pools have all that extra data? Try
to
list that pool's objects, verify that there are no surprises there
(e.g., use 'rados -p pool ls').

Yehuda




I'm just running that command now, and its taking some time. There is
a
large number of objects.

Once it has finished, what should I be looking for?



I assume the pool in question is the one that holds your objects data?
You should be looking for objects that are not expected to exist
anymore, and objects of buckets that don't exist anymore. The problem
here is to identify these.
I suggest starting by looking at all the existing buckets, compose a
list of all the bucket prefixes for the existing buckets, and try to
look whether there are objects that have different prefixes.

Yehuda



Any ideas? I've found the prefix, the number of objects in the pool
that
match that prefix numbers in the 21 millions
The actual 'radosgw-admin bucket stats' command reports it as only
having
1.2 million.


Well, the objects you're seeing are raw objects, and since rgw stripes
the data, it is expected to have more raw objects than objects in the
bucket. Still, it seems that you have much too many of these. You can
try to check whether there are pending multipart uploads that were
never completed using the S3 api.
At the moment there's no easy way to figure out which raw objects are
not supposed to exist. The process would be like this:
1. rados ls -p data pool
keep the list sorted
2. list objects in the bucket
3. for each object in (2), do: radosgw-admin object stat
--bucket=bucket --object=object --rgw-cache-enabled=false
(disabling the cache so that it goes quicker)
4. look at the result of (3), and generate a list of all the parts.
5. sort result of (4), compare it to (1)

Note that if you're running firefly or later, the raw objects are not
specified explicitly in the command you run at (3), so you might need
a different procedure, e.g., find out the raw objects random string
that is being used, remove it from the list generated in 1, etc.)

That's basically it.
I'll be interested to figure out what happened, why the garbage
collection didn't work correctly. You could try verifying that it's
working by:
   - create an object (let's say ~10MB in size).
   - radosgw-admin object stat --bucket=bucket --object=object
 (keep this info, see
   - remove the object
   - run radosgw-admin gc list --include-all and verify that the raw
parts are listed there
   - wait a few hours, repeat last step, see that the parts don't appear
there anymore
   - run rados -p pool ls, check to see if the raw objects still exist

Yehuda


Not sure where to go from here, and our cluster is slowly filling up
while
not clearing any space.



I did the last section:

I'll be interested to figure out what happened, why the garbage
collection didn't work correctly. You could try verifying that it's
working by:
   - create an object (let's say ~10MB in size).
   - radosgw-admin object stat --bucket=bucket --object=object
 (keep this info, see
   - remove the object
   - run radosgw-admin gc list --include-all and verify that the raw
parts are listed there
   - wait a few hours, repeat last step, see 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-11-28 Thread Yehuda Sadeh
On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote:
 On 2014-11-28 15:42, Yehuda Sadeh wrote:

 On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:

 On 2014-11-27 11:36, Yehuda Sadeh wrote:


 On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:


 On 2014-11-27 10:21, Yehuda Sadeh wrote:



 On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote:



 On 2014-11-27 09:38, Yehuda Sadeh wrote:




 On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote:




 I've been deleting a bucket which originally had 60TB of data in
 it,
 with
 our cluster doing only 1 replication, the total usage was 120TB.

 I've been deleting the objects slowly using S3 browser, and I can
 see
 the
 bucket usage is now down to around 2.5TB or 5TB with duplication,
 but
 the
 usage in the cluster has not changed.

 I've looked at garbage collection (radosgw-admin gc list --include
 all)
 and
 it just reports square brackets []

 I've run radosgw-admin temp remove --date=2014-11-20, and it
 doesn't
 appear
 to have any effect.

 Is there a way to check where this space is being consumed?

 Running 'ceph df' the USED space in the buckets pool is not showing
 any
 of
 the 57TB that should have been freed up from the deletion so far.

 Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual'
 and
 adding up all the buckets usage, this shows that the space has been
 freed
 from the bucket, but the cluster is all sorts of messed up.


 ANY IDEAS? What can I look at?





 Can you run 'radosgw-admin gc list --include-all'?

 Yehuda





 I've done it before, and it just returns square brackets [] (see
 below)

 radosgw-admin gc list --include-all
 []




 Do you know which of the rados pools have all that extra data? Try to
 list that pool's objects, verify that there are no surprises there
 (e.g., use 'rados -p pool ls').

 Yehuda




 I'm just running that command now, and its taking some time. There is a
 large number of objects.

 Once it has finished, what should I be looking for?



 I assume the pool in question is the one that holds your objects data?
 You should be looking for objects that are not expected to exist
 anymore, and objects of buckets that don't exist anymore. The problem
 here is to identify these.
 I suggest starting by looking at all the existing buckets, compose a
 list of all the bucket prefixes for the existing buckets, and try to
 look whether there are objects that have different prefixes.

 Yehuda



 Any ideas? I've found the prefix, the number of objects in the pool that
 match that prefix numbers in the 21 millions
 The actual 'radosgw-admin bucket stats' command reports it as only having
 1.2 million.


 Well, the objects you're seeing are raw objects, and since rgw stripes
 the data, it is expected to have more raw objects than objects in the
 bucket. Still, it seems that you have much too many of these. You can
 try to check whether there are pending multipart uploads that were
 never completed using the S3 api.
 At the moment there's no easy way to figure out which raw objects are
 not supposed to exist. The process would be like this:
 1. rados ls -p data pool
 keep the list sorted
 2. list objects in the bucket
 3. for each object in (2), do: radosgw-admin object stat
 --bucket=bucket --object=object --rgw-cache-enabled=false
 (disabling the cache so that it goes quicker)
 4. look at the result of (3), and generate a list of all the parts.
 5. sort result of (4), compare it to (1)

 Note that if you're running firefly or later, the raw objects are not
 specified explicitly in the command you run at (3), so you might need
 a different procedure, e.g., find out the raw objects random string
 that is being used, remove it from the list generated in 1, etc.)

 That's basically it.
 I'll be interested to figure out what happened, why the garbage
 collection didn't work correctly. You could try verifying that it's
 working by:
  - create an object (let's say ~10MB in size).
  - radosgw-admin object stat --bucket=bucket --object=object
(keep this info, see
  - remove the object
  - run radosgw-admin gc list --include-all and verify that the raw
 parts are listed there
  - wait a few hours, repeat last step, see that the parts don't appear
 there anymore
  - run rados -p pool ls, check to see if the raw objects still exist

 Yehuda


 Not sure where to go from here, and our cluster is slowly filling up
 while
 not clearing any space.



 I did the last section:

 I'll be interested to figure out what happened, why the garbage
 collection didn't work correctly. You could try verifying that it's
 working by:
  - create an object (let's say ~10MB in size).
  - radosgw-admin object stat --bucket=bucket --object=object
(keep this info, see
  - remove the object
  - run radosgw-admin gc list --include-all and verify that the raw
 parts are listed there
  - wait a few hours, repeat last step, see that the parts don't appear
 there anymore
  - run rados -p 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-11-28 Thread Ben


On 29/11/14 01:50, Yehuda Sadeh wrote:

On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote:

On 2014-11-28 15:42, Yehuda Sadeh wrote:

On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:

On 2014-11-27 11:36, Yehuda Sadeh wrote:


On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:


On 2014-11-27 10:21, Yehuda Sadeh wrote:



On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote:



On 2014-11-27 09:38, Yehuda Sadeh wrote:




On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote:




I've been deleting a bucket which originally had 60TB of data in
it,
with
our cluster doing only 1 replication, the total usage was 120TB.

I've been deleting the objects slowly using S3 browser, and I can
see
the
bucket usage is now down to around 2.5TB or 5TB with duplication,
but
the
usage in the cluster has not changed.

I've looked at garbage collection (radosgw-admin gc list --include
all)
and
it just reports square brackets []

I've run radosgw-admin temp remove --date=2014-11-20, and it
doesn't
appear
to have any effect.

Is there a way to check where this space is being consumed?

Running 'ceph df' the USED space in the buckets pool is not showing
any
of
the 57TB that should have been freed up from the deletion so far.

Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual'
and
adding up all the buckets usage, this shows that the space has been
freed
from the bucket, but the cluster is all sorts of messed up.


ANY IDEAS? What can I look at?





Can you run 'radosgw-admin gc list --include-all'?

Yehuda





I've done it before, and it just returns square brackets [] (see
below)

radosgw-admin gc list --include-all
[]




Do you know which of the rados pools have all that extra data? Try to
list that pool's objects, verify that there are no surprises there
(e.g., use 'rados -p pool ls').

Yehuda




I'm just running that command now, and its taking some time. There is a
large number of objects.

Once it has finished, what should I be looking for?



I assume the pool in question is the one that holds your objects data?
You should be looking for objects that are not expected to exist
anymore, and objects of buckets that don't exist anymore. The problem
here is to identify these.
I suggest starting by looking at all the existing buckets, compose a
list of all the bucket prefixes for the existing buckets, and try to
look whether there are objects that have different prefixes.

Yehuda



Any ideas? I've found the prefix, the number of objects in the pool that
match that prefix numbers in the 21 millions
The actual 'radosgw-admin bucket stats' command reports it as only having
1.2 million.


Well, the objects you're seeing are raw objects, and since rgw stripes
the data, it is expected to have more raw objects than objects in the
bucket. Still, it seems that you have much too many of these. You can
try to check whether there are pending multipart uploads that were
never completed using the S3 api.
At the moment there's no easy way to figure out which raw objects are
not supposed to exist. The process would be like this:
1. rados ls -p data pool
keep the list sorted
2. list objects in the bucket
3. for each object in (2), do: radosgw-admin object stat
--bucket=bucket --object=object --rgw-cache-enabled=false
(disabling the cache so that it goes quicker)
4. look at the result of (3), and generate a list of all the parts.
5. sort result of (4), compare it to (1)

Note that if you're running firefly or later, the raw objects are not
specified explicitly in the command you run at (3), so you might need
a different procedure, e.g., find out the raw objects random string
that is being used, remove it from the list generated in 1, etc.)

That's basically it.
I'll be interested to figure out what happened, why the garbage
collection didn't work correctly. You could try verifying that it's
working by:
  - create an object (let's say ~10MB in size).
  - radosgw-admin object stat --bucket=bucket --object=object
(keep this info, see
  - remove the object
  - run radosgw-admin gc list --include-all and verify that the raw
parts are listed there
  - wait a few hours, repeat last step, see that the parts don't appear
there anymore
  - run rados -p pool ls, check to see if the raw objects still exist

Yehuda


Not sure where to go from here, and our cluster is slowly filling up
while
not clearing any space.



I did the last section:

I'll be interested to figure out what happened, why the garbage
collection didn't work correctly. You could try verifying that it's
working by:
  - create an object (let's say ~10MB in size).
  - radosgw-admin object stat --bucket=bucket --object=object
(keep this info, see
  - remove the object
  - run radosgw-admin gc list --include-all and verify that the raw
parts are listed there
  - wait a few hours, repeat last step, see that the parts don't appear
there anymore
  - run rados -p pool ls, check to see if the raw objects still exist


I 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-11-28 Thread Yehuda Sadeh
On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote:

 On 29/11/14 01:50, Yehuda Sadeh wrote:

 On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote:

 On 2014-11-28 15:42, Yehuda Sadeh wrote:

 On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:

 On 2014-11-27 11:36, Yehuda Sadeh wrote:


 On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:


 On 2014-11-27 10:21, Yehuda Sadeh wrote:



 On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote:



 On 2014-11-27 09:38, Yehuda Sadeh wrote:




 On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote:




 I've been deleting a bucket which originally had 60TB of data in
 it,
 with
 our cluster doing only 1 replication, the total usage was 120TB.

 I've been deleting the objects slowly using S3 browser, and I can
 see
 the
 bucket usage is now down to around 2.5TB or 5TB with duplication,
 but
 the
 usage in the cluster has not changed.

 I've looked at garbage collection (radosgw-admin gc list
 --include
 all)
 and
 it just reports square brackets []

 I've run radosgw-admin temp remove --date=2014-11-20, and it
 doesn't
 appear
 to have any effect.

 Is there a way to check where this space is being consumed?

 Running 'ceph df' the USED space in the buckets pool is not
 showing
 any
 of
 the 57TB that should have been freed up from the deletion so far.

 Running 'radosgw-admin bucket stats | jshon | grep
 size_kb_actual'
 and
 adding up all the buckets usage, this shows that the space has
 been
 freed
 from the bucket, but the cluster is all sorts of messed up.


 ANY IDEAS? What can I look at?





 Can you run 'radosgw-admin gc list --include-all'?

 Yehuda





 I've done it before, and it just returns square brackets [] (see
 below)

 radosgw-admin gc list --include-all
 []




 Do you know which of the rados pools have all that extra data? Try
 to
 list that pool's objects, verify that there are no surprises there
 (e.g., use 'rados -p pool ls').

 Yehuda




 I'm just running that command now, and its taking some time. There is
 a
 large number of objects.

 Once it has finished, what should I be looking for?



 I assume the pool in question is the one that holds your objects data?
 You should be looking for objects that are not expected to exist
 anymore, and objects of buckets that don't exist anymore. The problem
 here is to identify these.
 I suggest starting by looking at all the existing buckets, compose a
 list of all the bucket prefixes for the existing buckets, and try to
 look whether there are objects that have different prefixes.

 Yehuda



 Any ideas? I've found the prefix, the number of objects in the pool
 that
 match that prefix numbers in the 21 millions
 The actual 'radosgw-admin bucket stats' command reports it as only
 having
 1.2 million.


 Well, the objects you're seeing are raw objects, and since rgw stripes
 the data, it is expected to have more raw objects than objects in the
 bucket. Still, it seems that you have much too many of these. You can
 try to check whether there are pending multipart uploads that were
 never completed using the S3 api.
 At the moment there's no easy way to figure out which raw objects are
 not supposed to exist. The process would be like this:
 1. rados ls -p data pool
 keep the list sorted
 2. list objects in the bucket
 3. for each object in (2), do: radosgw-admin object stat
 --bucket=bucket --object=object --rgw-cache-enabled=false
 (disabling the cache so that it goes quicker)
 4. look at the result of (3), and generate a list of all the parts.
 5. sort result of (4), compare it to (1)

 Note that if you're running firefly or later, the raw objects are not
 specified explicitly in the command you run at (3), so you might need
 a different procedure, e.g., find out the raw objects random string
 that is being used, remove it from the list generated in 1, etc.)

 That's basically it.
 I'll be interested to figure out what happened, why the garbage
 collection didn't work correctly. You could try verifying that it's
 working by:
   - create an object (let's say ~10MB in size).
   - radosgw-admin object stat --bucket=bucket --object=object
 (keep this info, see
   - remove the object
   - run radosgw-admin gc list --include-all and verify that the raw
 parts are listed there
   - wait a few hours, repeat last step, see that the parts don't appear
 there anymore
   - run rados -p pool ls, check to see if the raw objects still exist

 Yehuda

 Not sure where to go from here, and our cluster is slowly filling up
 while
 not clearing any space.



 I did the last section:

 I'll be interested to figure out what happened, why the garbage
 collection didn't work correctly. You could try verifying that it's
 working by:
   - create an object (let's say ~10MB in size).
   - radosgw-admin object stat --bucket=bucket --object=object
 (keep this info, see
   - remove the object
   - run radosgw-admin gc list --include-all and verify that the raw
 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-11-28 Thread Ben


On 29/11/14 11:40, Yehuda Sadeh wrote:

On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote:

On 29/11/14 01:50, Yehuda Sadeh wrote:

On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote:

On 2014-11-28 15:42, Yehuda Sadeh wrote:

On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:

On 2014-11-27 11:36, Yehuda Sadeh wrote:


On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:


On 2014-11-27 10:21, Yehuda Sadeh wrote:



On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote:



On 2014-11-27 09:38, Yehuda Sadeh wrote:




On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote:




I've been deleting a bucket which originally had 60TB of data in
it,
with
our cluster doing only 1 replication, the total usage was 120TB.

I've been deleting the objects slowly using S3 browser, and I can
see
the
bucket usage is now down to around 2.5TB or 5TB with duplication,
but
the
usage in the cluster has not changed.

I've looked at garbage collection (radosgw-admin gc list
--include
all)
and
it just reports square brackets []

I've run radosgw-admin temp remove --date=2014-11-20, and it
doesn't
appear
to have any effect.

Is there a way to check where this space is being consumed?

Running 'ceph df' the USED space in the buckets pool is not
showing
any
of
the 57TB that should have been freed up from the deletion so far.

Running 'radosgw-admin bucket stats | jshon | grep
size_kb_actual'
and
adding up all the buckets usage, this shows that the space has
been
freed
from the bucket, but the cluster is all sorts of messed up.


ANY IDEAS? What can I look at?





Can you run 'radosgw-admin gc list --include-all'?

Yehuda





I've done it before, and it just returns square brackets [] (see
below)

radosgw-admin gc list --include-all
[]




Do you know which of the rados pools have all that extra data? Try
to
list that pool's objects, verify that there are no surprises there
(e.g., use 'rados -p pool ls').

Yehuda




I'm just running that command now, and its taking some time. There is
a
large number of objects.

Once it has finished, what should I be looking for?



I assume the pool in question is the one that holds your objects data?
You should be looking for objects that are not expected to exist
anymore, and objects of buckets that don't exist anymore. The problem
here is to identify these.
I suggest starting by looking at all the existing buckets, compose a
list of all the bucket prefixes for the existing buckets, and try to
look whether there are objects that have different prefixes.

Yehuda



Any ideas? I've found the prefix, the number of objects in the pool
that
match that prefix numbers in the 21 millions
The actual 'radosgw-admin bucket stats' command reports it as only
having
1.2 million.


Well, the objects you're seeing are raw objects, and since rgw stripes
the data, it is expected to have more raw objects than objects in the
bucket. Still, it seems that you have much too many of these. You can
try to check whether there are pending multipart uploads that were
never completed using the S3 api.
At the moment there's no easy way to figure out which raw objects are
not supposed to exist. The process would be like this:
1. rados ls -p data pool
keep the list sorted
2. list objects in the bucket
3. for each object in (2), do: radosgw-admin object stat
--bucket=bucket --object=object --rgw-cache-enabled=false
(disabling the cache so that it goes quicker)
4. look at the result of (3), and generate a list of all the parts.
5. sort result of (4), compare it to (1)

Note that if you're running firefly or later, the raw objects are not
specified explicitly in the command you run at (3), so you might need
a different procedure, e.g., find out the raw objects random string
that is being used, remove it from the list generated in 1, etc.)

That's basically it.
I'll be interested to figure out what happened, why the garbage
collection didn't work correctly. You could try verifying that it's
working by:
   - create an object (let's say ~10MB in size).
   - radosgw-admin object stat --bucket=bucket --object=object
 (keep this info, see
   - remove the object
   - run radosgw-admin gc list --include-all and verify that the raw
parts are listed there
   - wait a few hours, repeat last step, see that the parts don't appear
there anymore
   - run rados -p pool ls, check to see if the raw objects still exist

Yehuda


Not sure where to go from here, and our cluster is slowly filling up
while
not clearing any space.



I did the last section:

I'll be interested to figure out what happened, why the garbage
collection didn't work correctly. You could try verifying that it's
working by:
   - create an object (let's say ~10MB in size).
   - radosgw-admin object stat --bucket=bucket --object=object
 (keep this info, see
   - remove the object
   - run radosgw-admin gc list --include-all and verify that the raw
parts are listed there
   - wait a few hours, repeat last step, see 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-11-27 Thread Yehuda Sadeh
On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:
 On 2014-11-27 11:36, Yehuda Sadeh wrote:

 On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:

 On 2014-11-27 10:21, Yehuda Sadeh wrote:


 On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote:


 On 2014-11-27 09:38, Yehuda Sadeh wrote:



 On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote:



 I've been deleting a bucket which originally had 60TB of data in it,
 with
 our cluster doing only 1 replication, the total usage was 120TB.

 I've been deleting the objects slowly using S3 browser, and I can see
 the
 bucket usage is now down to around 2.5TB or 5TB with duplication, but
 the
 usage in the cluster has not changed.

 I've looked at garbage collection (radosgw-admin gc list --include
 all)
 and
 it just reports square brackets []

 I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't
 appear
 to have any effect.

 Is there a way to check where this space is being consumed?

 Running 'ceph df' the USED space in the buckets pool is not showing
 any
 of
 the 57TB that should have been freed up from the deletion so far.

 Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual'
 and
 adding up all the buckets usage, this shows that the space has been
 freed
 from the bucket, but the cluster is all sorts of messed up.


 ANY IDEAS? What can I look at?




 Can you run 'radosgw-admin gc list --include-all'?

 Yehuda




 I've done it before, and it just returns square brackets [] (see below)

 radosgw-admin gc list --include-all
 []



 Do you know which of the rados pools have all that extra data? Try to
 list that pool's objects, verify that there are no surprises there
 (e.g., use 'rados -p pool ls').

 Yehuda



 I'm just running that command now, and its taking some time. There is a
 large number of objects.

 Once it has finished, what should I be looking for?


 I assume the pool in question is the one that holds your objects data?
 You should be looking for objects that are not expected to exist
 anymore, and objects of buckets that don't exist anymore. The problem
 here is to identify these.
 I suggest starting by looking at all the existing buckets, compose a
 list of all the bucket prefixes for the existing buckets, and try to
 look whether there are objects that have different prefixes.

 Yehuda


 Any ideas? I've found the prefix, the number of objects in the pool that
 match that prefix numbers in the 21 millions
 The actual 'radosgw-admin bucket stats' command reports it as only having
 1.2 million.

Well, the objects you're seeing are raw objects, and since rgw stripes
the data, it is expected to have more raw objects than objects in the
bucket. Still, it seems that you have much too many of these. You can
try to check whether there are pending multipart uploads that were
never completed using the S3 api.
At the moment there's no easy way to figure out which raw objects are
not supposed to exist. The process would be like this:
1. rados ls -p data pool
keep the list sorted
2. list objects in the bucket
3. for each object in (2), do: radosgw-admin object stat
--bucket=bucket --object=object --rgw-cache-enabled=false
(disabling the cache so that it goes quicker)
4. look at the result of (3), and generate a list of all the parts.
5. sort result of (4), compare it to (1)

Note that if you're running firefly or later, the raw objects are not
specified explicitly in the command you run at (3), so you might need
a different procedure, e.g., find out the raw objects random string
that is being used, remove it from the list generated in 1, etc.)

That's basically it.
I'll be interested to figure out what happened, why the garbage
collection didn't work correctly. You could try verifying that it's
working by:
 - create an object (let's say ~10MB in size).
 - radosgw-admin object stat --bucket=bucket --object=object
   (keep this info, see
 - remove the object
 - run radosgw-admin gc list --include-all and verify that the raw
parts are listed there
 - wait a few hours, repeat last step, see that the parts don't appear
there anymore
 - run rados -p pool ls, check to see if the raw objects still exist

Yehuda


 Not sure where to go from here, and our cluster is slowly filling up while
 not clearing any space.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-11-27 Thread Ben

On 2014-11-28 15:42, Yehuda Sadeh wrote:

On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:

On 2014-11-27 11:36, Yehuda Sadeh wrote:


On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:


On 2014-11-27 10:21, Yehuda Sadeh wrote:



On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote:



On 2014-11-27 09:38, Yehuda Sadeh wrote:




On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote:




I've been deleting a bucket which originally had 60TB of data in 
it,

with
our cluster doing only 1 replication, the total usage was 120TB.

I've been deleting the objects slowly using S3 browser, and I 
can see

the
bucket usage is now down to around 2.5TB or 5TB with 
duplication, but

the
usage in the cluster has not changed.

I've looked at garbage collection (radosgw-admin gc list 
--include

all)
and
it just reports square brackets []

I've run radosgw-admin temp remove --date=2014-11-20, and it 
doesn't

appear
to have any effect.

Is there a way to check where this space is being consumed?

Running 'ceph df' the USED space in the buckets pool is not 
showing

any
of
the 57TB that should have been freed up from the deletion so 
far.


Running 'radosgw-admin bucket stats | jshon | grep 
size_kb_actual'

and
adding up all the buckets usage, this shows that the space has 
been

freed
from the bucket, but the cluster is all sorts of messed up.


ANY IDEAS? What can I look at?





Can you run 'radosgw-admin gc list --include-all'?

Yehuda





I've done it before, and it just returns square brackets [] (see 
below)


radosgw-admin gc list --include-all
[]




Do you know which of the rados pools have all that extra data? Try 
to

list that pool's objects, verify that there are no surprises there
(e.g., use 'rados -p pool ls').

Yehuda




I'm just running that command now, and its taking some time. There 
is a

large number of objects.

Once it has finished, what should I be looking for?



I assume the pool in question is the one that holds your objects 
data?

You should be looking for objects that are not expected to exist
anymore, and objects of buckets that don't exist anymore. The problem
here is to identify these.
I suggest starting by looking at all the existing buckets, compose a
list of all the bucket prefixes for the existing buckets, and try to
look whether there are objects that have different prefixes.

Yehuda



Any ideas? I've found the prefix, the number of objects in the pool 
that

match that prefix numbers in the 21 millions
The actual 'radosgw-admin bucket stats' command reports it as only 
having

1.2 million.


Well, the objects you're seeing are raw objects, and since rgw stripes
the data, it is expected to have more raw objects than objects in the
bucket. Still, it seems that you have much too many of these. You can
try to check whether there are pending multipart uploads that were
never completed using the S3 api.
At the moment there's no easy way to figure out which raw objects are
not supposed to exist. The process would be like this:
1. rados ls -p data pool
keep the list sorted
2. list objects in the bucket
3. for each object in (2), do: radosgw-admin object stat
--bucket=bucket --object=object --rgw-cache-enabled=false
(disabling the cache so that it goes quicker)
4. look at the result of (3), and generate a list of all the parts.
5. sort result of (4), compare it to (1)

Note that if you're running firefly or later, the raw objects are not
specified explicitly in the command you run at (3), so you might need
a different procedure, e.g., find out the raw objects random string
that is being used, remove it from the list generated in 1, etc.)

That's basically it.
I'll be interested to figure out what happened, why the garbage
collection didn't work correctly. You could try verifying that it's
working by:
 - create an object (let's say ~10MB in size).
 - radosgw-admin object stat --bucket=bucket --object=object
   (keep this info, see
 - remove the object
 - run radosgw-admin gc list --include-all and verify that the raw
parts are listed there
 - wait a few hours, repeat last step, see that the parts don't appear
there anymore
 - run rados -p pool ls, check to see if the raw objects still exist

Yehuda



Not sure where to go from here, and our cluster is slowly filling up 
while

not clearing any space.



I did the last section:

I'll be interested to figure out what happened, why the garbage
collection didn't work correctly. You could try verifying that it's
working by:
 - create an object (let's say ~10MB in size).
 - radosgw-admin object stat --bucket=bucket --object=object
   (keep this info, see
 - remove the object
 - run radosgw-admin gc list --include-all and verify that the raw
parts are listed there
 - wait a few hours, repeat last step, see that the parts don't appear
there anymore
 - run rados -p pool ls, check to see if the raw objects still exist


I added the file, did a stat and it displayed the json output
I removed the object and 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-11-26 Thread Yehuda Sadeh
On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote:
 I've been deleting a bucket which originally had 60TB of data in it, with
 our cluster doing only 1 replication, the total usage was 120TB.

 I've been deleting the objects slowly using S3 browser, and I can see the
 bucket usage is now down to around 2.5TB or 5TB with duplication, but the
 usage in the cluster has not changed.

 I've looked at garbage collection (radosgw-admin gc list --include all) and
 it just reports square brackets []

 I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear
 to have any effect.

 Is there a way to check where this space is being consumed?

 Running 'ceph df' the USED space in the buckets pool is not showing any of
 the 57TB that should have been freed up from the deletion so far.

 Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and
 adding up all the buckets usage, this shows that the space has been freed
 from the bucket, but the cluster is all sorts of messed up.


 ANY IDEAS? What can I look at?

Can you run 'radosgw-admin gc list --include-all'?

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-11-26 Thread b

On 2014-11-27 09:38, Yehuda Sadeh wrote:

On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote:
I've been deleting a bucket which originally had 60TB of data in it, 
with

our cluster doing only 1 replication, the total usage was 120TB.

I've been deleting the objects slowly using S3 browser, and I can see 
the
bucket usage is now down to around 2.5TB or 5TB with duplication, but 
the

usage in the cluster has not changed.

I've looked at garbage collection (radosgw-admin gc list --include 
all) and

it just reports square brackets []

I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't 
appear

to have any effect.

Is there a way to check where this space is being consumed?

Running 'ceph df' the USED space in the buckets pool is not showing 
any of

the 57TB that should have been freed up from the deletion so far.

Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and
adding up all the buckets usage, this shows that the space has been 
freed

from the bucket, but the cluster is all sorts of messed up.


ANY IDEAS? What can I look at?


Can you run 'radosgw-admin gc list --include-all'?

Yehuda


I've done it before, and it just returns square brackets [] (see below)

radosgw-admin gc list --include-all
[]
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-11-26 Thread Yehuda Sadeh
On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:
 On 2014-11-27 10:21, Yehuda Sadeh wrote:

 On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote:

 On 2014-11-27 09:38, Yehuda Sadeh wrote:


 On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote:


 I've been deleting a bucket which originally had 60TB of data in it,
 with
 our cluster doing only 1 replication, the total usage was 120TB.

 I've been deleting the objects slowly using S3 browser, and I can see
 the
 bucket usage is now down to around 2.5TB or 5TB with duplication, but
 the
 usage in the cluster has not changed.

 I've looked at garbage collection (radosgw-admin gc list --include all)
 and
 it just reports square brackets []

 I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't
 appear
 to have any effect.

 Is there a way to check where this space is being consumed?

 Running 'ceph df' the USED space in the buckets pool is not showing any
 of
 the 57TB that should have been freed up from the deletion so far.

 Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and
 adding up all the buckets usage, this shows that the space has been
 freed
 from the bucket, but the cluster is all sorts of messed up.


 ANY IDEAS? What can I look at?



 Can you run 'radosgw-admin gc list --include-all'?

 Yehuda



 I've done it before, and it just returns square brackets [] (see below)

 radosgw-admin gc list --include-all
 []


 Do you know which of the rados pools have all that extra data? Try to
 list that pool's objects, verify that there are no surprises there
 (e.g., use 'rados -p pool ls').

 Yehuda


 I'm just running that command now, and its taking some time. There is a
 large number of objects.

 Once it has finished, what should I be looking for?

I assume the pool in question is the one that holds your objects data?
You should be looking for objects that are not expected to exist
anymore, and objects of buckets that don't exist anymore. The problem
here is to identify these.
I suggest starting by looking at all the existing buckets, compose a
list of all the bucket prefixes for the existing buckets, and try to
look whether there are objects that have different prefixes.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-11-26 Thread b

On 2014-11-27 11:36, Yehuda Sadeh wrote:

On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:

On 2014-11-27 10:21, Yehuda Sadeh wrote:


On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote:


On 2014-11-27 09:38, Yehuda Sadeh wrote:



On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote:



I've been deleting a bucket which originally had 60TB of data in 
it,

with
our cluster doing only 1 replication, the total usage was 120TB.

I've been deleting the objects slowly using S3 browser, and I can 
see

the
bucket usage is now down to around 2.5TB or 5TB with duplication, 
but

the
usage in the cluster has not changed.

I've looked at garbage collection (radosgw-admin gc list --include 
all)

and
it just reports square brackets []

I've run radosgw-admin temp remove --date=2014-11-20, and it 
doesn't

appear
to have any effect.

Is there a way to check where this space is being consumed?

Running 'ceph df' the USED space in the buckets pool is not 
showing any

of
the 57TB that should have been freed up from the deletion so far.

Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' 
and
adding up all the buckets usage, this shows that the space has 
been

freed
from the bucket, but the cluster is all sorts of messed up.


ANY IDEAS? What can I look at?




Can you run 'radosgw-admin gc list --include-all'?

Yehuda




I've done it before, and it just returns square brackets [] (see 
below)


radosgw-admin gc list --include-all
[]



Do you know which of the rados pools have all that extra data? Try to
list that pool's objects, verify that there are no surprises there
(e.g., use 'rados -p pool ls').

Yehuda



I'm just running that command now, and its taking some time. There is 
a

large number of objects.

Once it has finished, what should I be looking for?


I assume the pool in question is the one that holds your objects data?
You should be looking for objects that are not expected to exist
anymore, and objects of buckets that don't exist anymore. The problem
here is to identify these.
I suggest starting by looking at all the existing buckets, compose a
list of all the bucket prefixes for the existing buckets, and try to
look whether there are objects that have different prefixes.

Yehuda


How do I get a list of bucket prefixes? I generated the list of objects 
in the .rgw.buckets pool and it has over 32million objects in it.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-11-26 Thread b

On 2014-11-27 11:36, Yehuda Sadeh wrote:

On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:

On 2014-11-27 10:21, Yehuda Sadeh wrote:


On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote:


On 2014-11-27 09:38, Yehuda Sadeh wrote:



On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote:



I've been deleting a bucket which originally had 60TB of data in 
it,

with
our cluster doing only 1 replication, the total usage was 120TB.

I've been deleting the objects slowly using S3 browser, and I can 
see

the
bucket usage is now down to around 2.5TB or 5TB with duplication, 
but

the
usage in the cluster has not changed.

I've looked at garbage collection (radosgw-admin gc list --include 
all)

and
it just reports square brackets []

I've run radosgw-admin temp remove --date=2014-11-20, and it 
doesn't

appear
to have any effect.

Is there a way to check where this space is being consumed?

Running 'ceph df' the USED space in the buckets pool is not 
showing any

of
the 57TB that should have been freed up from the deletion so far.

Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' 
and
adding up all the buckets usage, this shows that the space has 
been

freed
from the bucket, but the cluster is all sorts of messed up.


ANY IDEAS? What can I look at?




Can you run 'radosgw-admin gc list --include-all'?

Yehuda




I've done it before, and it just returns square brackets [] (see 
below)


radosgw-admin gc list --include-all
[]



Do you know which of the rados pools have all that extra data? Try to
list that pool's objects, verify that there are no surprises there
(e.g., use 'rados -p pool ls').

Yehuda



I'm just running that command now, and its taking some time. There is 
a

large number of objects.

Once it has finished, what should I be looking for?


I assume the pool in question is the one that holds your objects data?
You should be looking for objects that are not expected to exist
anymore, and objects of buckets that don't exist anymore. The problem
here is to identify these.
I suggest starting by looking at all the existing buckets, compose a
list of all the bucket prefixes for the existing buckets, and try to
look whether there are objects that have different prefixes.

Yehuda


I've found the bucket prefix by doing 'radosgw-admin bucket stats 
--bucket=bucketname'


When going through the list of objects with the prefix, it reports 
21million objects. But the bucket stats query only reports 1.2million 
objects in the bucket

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com