Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote: On 29/11/14 11:40, Yehuda Sadeh wrote: On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote: On 29/11/14 01:50, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote: On 2014-11-28 15:42, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote: On 2014-11-27 11:36, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote: On 2014-11-27 10:21, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote: On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] Do you know which of the rados pools have all that extra data? Try to list that pool's objects, verify that there are no surprises there (e.g., use 'rados -p pool ls'). Yehuda I'm just running that command now, and its taking some time. There is a large number of objects. Once it has finished, what should I be looking for? I assume the pool in question is the one that holds your objects data? You should be looking for objects that are not expected to exist anymore, and objects of buckets that don't exist anymore. The problem here is to identify these. I suggest starting by looking at all the existing buckets, compose a list of all the bucket prefixes for the existing buckets, and try to look whether there are objects that have different prefixes. Yehuda Any ideas? I've found the prefix, the number of objects in the pool that match that prefix numbers in the 21 millions The actual 'radosgw-admin bucket stats' command reports it as only having 1.2 million. Well, the objects you're seeing are raw objects, and since rgw stripes the data, it is expected to have more raw objects than objects in the bucket. Still, it seems that you have much too many of these. You can try to check whether there are pending multipart uploads that were never completed using the S3 api. At the moment there's no easy way to figure out which raw objects are not supposed to exist. The process would be like this: 1. rados ls -p data pool keep the list sorted 2. list objects in the bucket 3. for each object in (2), do: radosgw-admin object stat --bucket=bucket --object=object --rgw-cache-enabled=false (disabling the cache so that it goes quicker) 4. look at the result of (3), and generate a list of all the parts. 5. sort result of (4), compare it to (1) Note that if you're running firefly or later, the raw objects are not specified explicitly in the command you run at (3), so you might need a different procedure, e.g., find out the raw objects random string that is being used, remove it from the list generated in 1, etc.) That's basically it. I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours, repeat last step, see that the parts don't appear there anymore - run rados -p pool ls, check to see if the raw objects still exist Yehuda Not sure where to go from here, and our cluster is slowly filling up while not clearing any space. I did the last section: I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On 2014-12-02 08:39, Yehuda Sadeh wrote: On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote: On 29/11/14 11:40, Yehuda Sadeh wrote: On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote: On 29/11/14 01:50, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote: On 2014-11-28 15:42, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote: On 2014-11-27 11:36, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote: On 2014-11-27 10:21, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote: On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] Do you know which of the rados pools have all that extra data? Try to list that pool's objects, verify that there are no surprises there (e.g., use 'rados -p pool ls'). Yehuda I'm just running that command now, and its taking some time. There is a large number of objects. Once it has finished, what should I be looking for? I assume the pool in question is the one that holds your objects data? You should be looking for objects that are not expected to exist anymore, and objects of buckets that don't exist anymore. The problem here is to identify these. I suggest starting by looking at all the existing buckets, compose a list of all the bucket prefixes for the existing buckets, and try to look whether there are objects that have different prefixes. Yehuda Any ideas? I've found the prefix, the number of objects in the pool that match that prefix numbers in the 21 millions The actual 'radosgw-admin bucket stats' command reports it as only having 1.2 million. Well, the objects you're seeing are raw objects, and since rgw stripes the data, it is expected to have more raw objects than objects in the bucket. Still, it seems that you have much too many of these. You can try to check whether there are pending multipart uploads that were never completed using the S3 api. At the moment there's no easy way to figure out which raw objects are not supposed to exist. The process would be like this: 1. rados ls -p data pool keep the list sorted 2. list objects in the bucket 3. for each object in (2), do: radosgw-admin object stat --bucket=bucket --object=object --rgw-cache-enabled=false (disabling the cache so that it goes quicker) 4. look at the result of (3), and generate a list of all the parts. 5. sort result of (4), compare it to (1) Note that if you're running firefly or later, the raw objects are not specified explicitly in the command you run at (3), so you might need a different procedure, e.g., find out the raw objects random string that is being used, remove it from the list generated in 1, etc.) That's basically it. I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours, repeat last step, see that the parts don't appear there anymore - run rados -p pool ls, check to see if the raw objects still exist Yehuda Not sure where to go from here, and our cluster is slowly filling up while not clearing any space. I did the last section: I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote: On 2014-12-02 08:39, Yehuda Sadeh wrote: On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote: On 29/11/14 11:40, Yehuda Sadeh wrote: On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote: On 29/11/14 01:50, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote: On 2014-11-28 15:42, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote: On 2014-11-27 11:36, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote: On 2014-11-27 10:21, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote: On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] Do you know which of the rados pools have all that extra data? Try to list that pool's objects, verify that there are no surprises there (e.g., use 'rados -p pool ls'). Yehuda I'm just running that command now, and its taking some time. There is a large number of objects. Once it has finished, what should I be looking for? I assume the pool in question is the one that holds your objects data? You should be looking for objects that are not expected to exist anymore, and objects of buckets that don't exist anymore. The problem here is to identify these. I suggest starting by looking at all the existing buckets, compose a list of all the bucket prefixes for the existing buckets, and try to look whether there are objects that have different prefixes. Yehuda Any ideas? I've found the prefix, the number of objects in the pool that match that prefix numbers in the 21 millions The actual 'radosgw-admin bucket stats' command reports it as only having 1.2 million. Well, the objects you're seeing are raw objects, and since rgw stripes the data, it is expected to have more raw objects than objects in the bucket. Still, it seems that you have much too many of these. You can try to check whether there are pending multipart uploads that were never completed using the S3 api. At the moment there's no easy way to figure out which raw objects are not supposed to exist. The process would be like this: 1. rados ls -p data pool keep the list sorted 2. list objects in the bucket 3. for each object in (2), do: radosgw-admin object stat --bucket=bucket --object=object --rgw-cache-enabled=false (disabling the cache so that it goes quicker) 4. look at the result of (3), and generate a list of all the parts. 5. sort result of (4), compare it to (1) Note that if you're running firefly or later, the raw objects are not specified explicitly in the command you run at (3), so you might need a different procedure, e.g., find out the raw objects random string that is being used, remove it from the list generated in 1, etc.) That's basically it. I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours, repeat last step, see that the parts don't appear there anymore - run rados -p pool ls, check to see if the raw objects still exist Yehuda Not sure where to go from here, and our cluster is slowly filling up while not clearing any space. I did the last section: I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On 2014-12-02 09:25, Yehuda Sadeh wrote: On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote: On 2014-12-02 08:39, Yehuda Sadeh wrote: On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote: On 29/11/14 11:40, Yehuda Sadeh wrote: On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote: On 29/11/14 01:50, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote: On 2014-11-28 15:42, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote: On 2014-11-27 11:36, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote: On 2014-11-27 10:21, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote: On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] Do you know which of the rados pools have all that extra data? Try to list that pool's objects, verify that there are no surprises there (e.g., use 'rados -p pool ls'). Yehuda I'm just running that command now, and its taking some time. There is a large number of objects. Once it has finished, what should I be looking for? I assume the pool in question is the one that holds your objects data? You should be looking for objects that are not expected to exist anymore, and objects of buckets that don't exist anymore. The problem here is to identify these. I suggest starting by looking at all the existing buckets, compose a list of all the bucket prefixes for the existing buckets, and try to look whether there are objects that have different prefixes. Yehuda Any ideas? I've found the prefix, the number of objects in the pool that match that prefix numbers in the 21 millions The actual 'radosgw-admin bucket stats' command reports it as only having 1.2 million. Well, the objects you're seeing are raw objects, and since rgw stripes the data, it is expected to have more raw objects than objects in the bucket. Still, it seems that you have much too many of these. You can try to check whether there are pending multipart uploads that were never completed using the S3 api. At the moment there's no easy way to figure out which raw objects are not supposed to exist. The process would be like this: 1. rados ls -p data pool keep the list sorted 2. list objects in the bucket 3. for each object in (2), do: radosgw-admin object stat --bucket=bucket --object=object --rgw-cache-enabled=false (disabling the cache so that it goes quicker) 4. look at the result of (3), and generate a list of all the parts. 5. sort result of (4), compare it to (1) Note that if you're running firefly or later, the raw objects are not specified explicitly in the command you run at (3), so you might need a different procedure, e.g., find out the raw objects random string that is being used, remove it from the list generated in 1, etc.) That's basically it. I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours, repeat last step, see that the parts don't appear there anymore - run rados -p pool ls, check to see if the raw objects still exist Yehuda Not sure where to go from here, and our cluster is slowly filling up while not clearing any space. I did the last section: I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On Mon, Dec 1, 2014 at 3:20 PM, Ben b@benjackson.email wrote: On 2014-12-02 09:25, Yehuda Sadeh wrote: On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote: On 2014-12-02 08:39, Yehuda Sadeh wrote: On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote: On 29/11/14 11:40, Yehuda Sadeh wrote: On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote: On 29/11/14 01:50, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote: On 2014-11-28 15:42, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote: On 2014-11-27 11:36, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote: On 2014-11-27 10:21, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote: On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] Do you know which of the rados pools have all that extra data? Try to list that pool's objects, verify that there are no surprises there (e.g., use 'rados -p pool ls'). Yehuda I'm just running that command now, and its taking some time. There is a large number of objects. Once it has finished, what should I be looking for? I assume the pool in question is the one that holds your objects data? You should be looking for objects that are not expected to exist anymore, and objects of buckets that don't exist anymore. The problem here is to identify these. I suggest starting by looking at all the existing buckets, compose a list of all the bucket prefixes for the existing buckets, and try to look whether there are objects that have different prefixes. Yehuda Any ideas? I've found the prefix, the number of objects in the pool that match that prefix numbers in the 21 millions The actual 'radosgw-admin bucket stats' command reports it as only having 1.2 million. Well, the objects you're seeing are raw objects, and since rgw stripes the data, it is expected to have more raw objects than objects in the bucket. Still, it seems that you have much too many of these. You can try to check whether there are pending multipart uploads that were never completed using the S3 api. At the moment there's no easy way to figure out which raw objects are not supposed to exist. The process would be like this: 1. rados ls -p data pool keep the list sorted 2. list objects in the bucket 3. for each object in (2), do: radosgw-admin object stat --bucket=bucket --object=object --rgw-cache-enabled=false (disabling the cache so that it goes quicker) 4. look at the result of (3), and generate a list of all the parts. 5. sort result of (4), compare it to (1) Note that if you're running firefly or later, the raw objects are not specified explicitly in the command you run at (3), so you might need a different procedure, e.g., find out the raw objects random string that is being used, remove it from the list generated in 1, etc.) That's basically it. I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours, repeat last step, see that the parts don't appear there anymore - run rados -p pool ls, check to see if the raw objects still exist Yehuda Not sure where to go from here, and our cluster is slowly filling up while not clearing any space. I
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On 2014-12-02 11:21, Yehuda Sadeh wrote: On Mon, Dec 1, 2014 at 3:47 PM, Ben b@benjackson.email wrote: On 2014-12-02 10:40, Yehuda Sadeh wrote: On Mon, Dec 1, 2014 at 3:20 PM, Ben b@benjackson.email wrote: On 2014-12-02 09:25, Yehuda Sadeh wrote: On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote: On 2014-12-02 08:39, Yehuda Sadeh wrote: On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote: On 29/11/14 11:40, Yehuda Sadeh wrote: On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote: On 29/11/14 01:50, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote: On 2014-11-28 15:42, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote: On 2014-11-27 11:36, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote: On 2014-11-27 10:21, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote: On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] Do you know which of the rados pools have all that extra data? Try to list that pool's objects, verify that there are no surprises there (e.g., use 'rados -p pool ls'). Yehuda I'm just running that command now, and its taking some time. There is a large number of objects. Once it has finished, what should I be looking for? I assume the pool in question is the one that holds your objects data? You should be looking for objects that are not expected to exist anymore, and objects of buckets that don't exist anymore. The problem here is to identify these. I suggest starting by looking at all the existing buckets, compose a list of all the bucket prefixes for the existing buckets, and try to look whether there are objects that have different prefixes. Yehuda Any ideas? I've found the prefix, the number of objects in the pool that match that prefix numbers in the 21 millions The actual 'radosgw-admin bucket stats' command reports it as only having 1.2 million. Well, the objects you're seeing are raw objects, and since rgw stripes the data, it is expected to have more raw objects than objects in the bucket. Still, it seems that you have much too many of these. You can try to check whether there are pending multipart uploads that were never completed using the S3 api. At the moment there's no easy way to figure out which raw objects are not supposed to exist. The process would be like this: 1. rados ls -p data pool keep the list sorted 2. list objects in the bucket 3. for each object in (2), do: radosgw-admin object stat --bucket=bucket --object=object --rgw-cache-enabled=false (disabling the cache so that it goes quicker) 4. look at the result of (3), and generate a list of all the parts. 5. sort result of (4), compare it to (1) Note that if you're running firefly or later, the raw objects are not specified explicitly in the command you run at (3), so you might need a different procedure, e.g., find out the raw objects random string that is being used, remove it from the list generated in 1, etc.) That's basically it. I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours, repeat last step, see that the parts don't appear there anymore - run rados -p pool ls, check to see if the raw objects still exist
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On Mon, Dec 1, 2014 at 4:23 PM, Ben b@benjackson.email wrote: On 2014-12-02 11:21, Yehuda Sadeh wrote: On Mon, Dec 1, 2014 at 3:47 PM, Ben b@benjackson.email wrote: On 2014-12-02 10:40, Yehuda Sadeh wrote: On Mon, Dec 1, 2014 at 3:20 PM, Ben b@benjackson.email wrote: On 2014-12-02 09:25, Yehuda Sadeh wrote: On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote: On 2014-12-02 08:39, Yehuda Sadeh wrote: On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote: On 29/11/14 11:40, Yehuda Sadeh wrote: On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote: On 29/11/14 01:50, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote: On 2014-11-28 15:42, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote: On 2014-11-27 11:36, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote: On 2014-11-27 10:21, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote: On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] Do you know which of the rados pools have all that extra data? Try to list that pool's objects, verify that there are no surprises there (e.g., use 'rados -p pool ls'). Yehuda I'm just running that command now, and its taking some time. There is a large number of objects. Once it has finished, what should I be looking for? I assume the pool in question is the one that holds your objects data? You should be looking for objects that are not expected to exist anymore, and objects of buckets that don't exist anymore. The problem here is to identify these. I suggest starting by looking at all the existing buckets, compose a list of all the bucket prefixes for the existing buckets, and try to look whether there are objects that have different prefixes. Yehuda Any ideas? I've found the prefix, the number of objects in the pool that match that prefix numbers in the 21 millions The actual 'radosgw-admin bucket stats' command reports it as only having 1.2 million. Well, the objects you're seeing are raw objects, and since rgw stripes the data, it is expected to have more raw objects than objects in the bucket. Still, it seems that you have much too many of these. You can try to check whether there are pending multipart uploads that were never completed using the S3 api. At the moment there's no easy way to figure out which raw objects are not supposed to exist. The process would be like this: 1. rados ls -p data pool keep the list sorted 2. list objects in the bucket 3. for each object in (2), do: radosgw-admin object stat --bucket=bucket --object=object --rgw-cache-enabled=false (disabling the cache so that it goes quicker) 4. look at the result of (3), and generate a list of all the parts. 5. sort result of (4), compare it to (1) Note that if you're running firefly or later, the raw objects are not specified explicitly in the command you run at (3), so you might need a different procedure, e.g., find out the raw objects random string that is being used, remove it from the list generated in 1, etc.) That's basically it. I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On 2014-12-02 11:25, Yehuda Sadeh wrote: On Mon, Dec 1, 2014 at 4:23 PM, Ben b@benjackson.email wrote: On 2014-12-02 11:21, Yehuda Sadeh wrote: On Mon, Dec 1, 2014 at 3:47 PM, Ben b@benjackson.email wrote: On 2014-12-02 10:40, Yehuda Sadeh wrote: On Mon, Dec 1, 2014 at 3:20 PM, Ben b@benjackson.email wrote: On 2014-12-02 09:25, Yehuda Sadeh wrote: On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote: On 2014-12-02 08:39, Yehuda Sadeh wrote: On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote: On 29/11/14 11:40, Yehuda Sadeh wrote: On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote: On 29/11/14 01:50, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote: On 2014-11-28 15:42, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote: On 2014-11-27 11:36, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote: On 2014-11-27 10:21, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote: On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] Do you know which of the rados pools have all that extra data? Try to list that pool's objects, verify that there are no surprises there (e.g., use 'rados -p pool ls'). Yehuda I'm just running that command now, and its taking some time. There is a large number of objects. Once it has finished, what should I be looking for? I assume the pool in question is the one that holds your objects data? You should be looking for objects that are not expected to exist anymore, and objects of buckets that don't exist anymore. The problem here is to identify these. I suggest starting by looking at all the existing buckets, compose a list of all the bucket prefixes for the existing buckets, and try to look whether there are objects that have different prefixes. Yehuda Any ideas? I've found the prefix, the number of objects in the pool that match that prefix numbers in the 21 millions The actual 'radosgw-admin bucket stats' command reports it as only having 1.2 million. Well, the objects you're seeing are raw objects, and since rgw stripes the data, it is expected to have more raw objects than objects in the bucket. Still, it seems that you have much too many of these. You can try to check whether there are pending multipart uploads that were never completed using the S3 api. At the moment there's no easy way to figure out which raw objects are not supposed to exist. The process would be like this: 1. rados ls -p data pool keep the list sorted 2. list objects in the bucket 3. for each object in (2), do: radosgw-admin object stat --bucket=bucket --object=object --rgw-cache-enabled=false (disabling the cache so that it goes quicker) 4. look at the result of (3), and generate a list of all the parts. 5. sort result of (4), compare it to (1) Note that if you're running firefly or later, the raw objects are not specified explicitly in the command you run at (3), so you might need a different procedure, e.g., find out the raw objects random string that is being used, remove it from the list generated in 1, etc.) That's basically it. I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours,
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On Mon, Dec 1, 2014 at 3:47 PM, Ben b@benjackson.email wrote: On 2014-12-02 10:40, Yehuda Sadeh wrote: On Mon, Dec 1, 2014 at 3:20 PM, Ben b@benjackson.email wrote: On 2014-12-02 09:25, Yehuda Sadeh wrote: On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote: On 2014-12-02 08:39, Yehuda Sadeh wrote: On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote: On 29/11/14 11:40, Yehuda Sadeh wrote: On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote: On 29/11/14 01:50, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote: On 2014-11-28 15:42, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote: On 2014-11-27 11:36, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote: On 2014-11-27 10:21, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote: On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] Do you know which of the rados pools have all that extra data? Try to list that pool's objects, verify that there are no surprises there (e.g., use 'rados -p pool ls'). Yehuda I'm just running that command now, and its taking some time. There is a large number of objects. Once it has finished, what should I be looking for? I assume the pool in question is the one that holds your objects data? You should be looking for objects that are not expected to exist anymore, and objects of buckets that don't exist anymore. The problem here is to identify these. I suggest starting by looking at all the existing buckets, compose a list of all the bucket prefixes for the existing buckets, and try to look whether there are objects that have different prefixes. Yehuda Any ideas? I've found the prefix, the number of objects in the pool that match that prefix numbers in the 21 millions The actual 'radosgw-admin bucket stats' command reports it as only having 1.2 million. Well, the objects you're seeing are raw objects, and since rgw stripes the data, it is expected to have more raw objects than objects in the bucket. Still, it seems that you have much too many of these. You can try to check whether there are pending multipart uploads that were never completed using the S3 api. At the moment there's no easy way to figure out which raw objects are not supposed to exist. The process would be like this: 1. rados ls -p data pool keep the list sorted 2. list objects in the bucket 3. for each object in (2), do: radosgw-admin object stat --bucket=bucket --object=object --rgw-cache-enabled=false (disabling the cache so that it goes quicker) 4. look at the result of (3), and generate a list of all the parts. 5. sort result of (4), compare it to (1) Note that if you're running firefly or later, the raw objects are not specified explicitly in the command you run at (3), so you might need a different procedure, e.g., find out the raw objects random string that is being used, remove it from the list generated in 1, etc.) That's basically it. I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours, repeat last step, see that the parts don't appear there anymore - run rados -p pool ls, check to see if
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On Mon, Dec 1, 2014 at 4:26 PM, Ben b@benjackson.email wrote: On 2014-12-02 11:25, Yehuda Sadeh wrote: On Mon, Dec 1, 2014 at 4:23 PM, Ben b@benjackson.email wrote: ... How can I tell if the shard has an object in it from the logs? Search for a different sequence (e.g., search for rgw.gc_remove). Yehuda 0 Results in the logs for rgw.gc_remove Well, something is modifying the gc log. Do you happen to have more than one radosgw running on the same cluster? Yehuda We have 2 radosgw servers obj01 and obj02 Are both of them pointing at the same zone? Yes, they are load balanced Well, the gc log show entries, and then it doesn't, so something clears these up. Try reproducing again with logs on, see if you see new entries in the rgw logs. If you don't see these, maybe try turning on 'debug ms = 1' on your osds (ceph tell osd.* injectargs '--debug_ms 1'), and look in your osd logs for such messages. These might give you some hint for their origin. Also, could it be that you ran 'radosgw-admin gc process', instead of waiting for the gc cycle to complete? Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On 2014-12-02 15:03, Yehuda Sadeh wrote: On Mon, Dec 1, 2014 at 4:26 PM, Ben b@benjackson.email wrote: On 2014-12-02 11:25, Yehuda Sadeh wrote: On Mon, Dec 1, 2014 at 4:23 PM, Ben b@benjackson.email wrote: ... How can I tell if the shard has an object in it from the logs? Search for a different sequence (e.g., search for rgw.gc_remove). Yehuda 0 Results in the logs for rgw.gc_remove Well, something is modifying the gc log. Do you happen to have more than one radosgw running on the same cluster? Yehuda We have 2 radosgw servers obj01 and obj02 Are both of them pointing at the same zone? Yes, they are load balanced Well, the gc log show entries, and then it doesn't, so something clears these up. Try reproducing again with logs on, see if you see new entries in the rgw logs. If you don't see these, maybe try turning on 'debug ms = 1' on your osds (ceph tell osd.* injectargs '--debug_ms 1'), and look in your osd logs for such messages. These might give you some hint for their origin. Also, could it be that you ran 'radosgw-admin gc process', instead of waiting for the gc cycle to complete? Yehuda I did anohter test, this time with a 600mb file. I uploaded it, then deleted the file and did a gc list --include all. It displayed around 143 _shadow_ files. I let GC process itself (I did not force this process) and I checked the pool afterward by running 'rados ls -p .rgw.buckets | grep gc-listed-shadowfiles' and they no longer exist. I've added the debug ms to the OSDs, I'll do another test with the 600mb file. Also worth noting, I have started clearing out files from the .rgw.buckets pool that are from a bucket which has been deleted and no longer visible by running 'rados -p .rgw.gc rm' over all the _shadow_ files contained in that bucket prefix default.4804.14__shadow_ Is this alright to do, or is there a better way to clear out files? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On 29/11/14 11:40, Yehuda Sadeh wrote: On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote: On 29/11/14 01:50, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote: On 2014-11-28 15:42, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote: On 2014-11-27 11:36, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote: On 2014-11-27 10:21, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote: On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] Do you know which of the rados pools have all that extra data? Try to list that pool's objects, verify that there are no surprises there (e.g., use 'rados -p pool ls'). Yehuda I'm just running that command now, and its taking some time. There is a large number of objects. Once it has finished, what should I be looking for? I assume the pool in question is the one that holds your objects data? You should be looking for objects that are not expected to exist anymore, and objects of buckets that don't exist anymore. The problem here is to identify these. I suggest starting by looking at all the existing buckets, compose a list of all the bucket prefixes for the existing buckets, and try to look whether there are objects that have different prefixes. Yehuda Any ideas? I've found the prefix, the number of objects in the pool that match that prefix numbers in the 21 millions The actual 'radosgw-admin bucket stats' command reports it as only having 1.2 million. Well, the objects you're seeing are raw objects, and since rgw stripes the data, it is expected to have more raw objects than objects in the bucket. Still, it seems that you have much too many of these. You can try to check whether there are pending multipart uploads that were never completed using the S3 api. At the moment there's no easy way to figure out which raw objects are not supposed to exist. The process would be like this: 1. rados ls -p data pool keep the list sorted 2. list objects in the bucket 3. for each object in (2), do: radosgw-admin object stat --bucket=bucket --object=object --rgw-cache-enabled=false (disabling the cache so that it goes quicker) 4. look at the result of (3), and generate a list of all the parts. 5. sort result of (4), compare it to (1) Note that if you're running firefly or later, the raw objects are not specified explicitly in the command you run at (3), so you might need a different procedure, e.g., find out the raw objects random string that is being used, remove it from the list generated in 1, etc.) That's basically it. I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours, repeat last step, see that the parts don't appear there anymore - run rados -p pool ls, check to see if the raw objects still exist Yehuda Not sure where to go from here, and our cluster is slowly filling up while not clearing any space. I did the last section: I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours, repeat last step, see
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote: On 2014-11-28 15:42, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote: On 2014-11-27 11:36, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote: On 2014-11-27 10:21, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote: On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] Do you know which of the rados pools have all that extra data? Try to list that pool's objects, verify that there are no surprises there (e.g., use 'rados -p pool ls'). Yehuda I'm just running that command now, and its taking some time. There is a large number of objects. Once it has finished, what should I be looking for? I assume the pool in question is the one that holds your objects data? You should be looking for objects that are not expected to exist anymore, and objects of buckets that don't exist anymore. The problem here is to identify these. I suggest starting by looking at all the existing buckets, compose a list of all the bucket prefixes for the existing buckets, and try to look whether there are objects that have different prefixes. Yehuda Any ideas? I've found the prefix, the number of objects in the pool that match that prefix numbers in the 21 millions The actual 'radosgw-admin bucket stats' command reports it as only having 1.2 million. Well, the objects you're seeing are raw objects, and since rgw stripes the data, it is expected to have more raw objects than objects in the bucket. Still, it seems that you have much too many of these. You can try to check whether there are pending multipart uploads that were never completed using the S3 api. At the moment there's no easy way to figure out which raw objects are not supposed to exist. The process would be like this: 1. rados ls -p data pool keep the list sorted 2. list objects in the bucket 3. for each object in (2), do: radosgw-admin object stat --bucket=bucket --object=object --rgw-cache-enabled=false (disabling the cache so that it goes quicker) 4. look at the result of (3), and generate a list of all the parts. 5. sort result of (4), compare it to (1) Note that if you're running firefly or later, the raw objects are not specified explicitly in the command you run at (3), so you might need a different procedure, e.g., find out the raw objects random string that is being used, remove it from the list generated in 1, etc.) That's basically it. I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours, repeat last step, see that the parts don't appear there anymore - run rados -p pool ls, check to see if the raw objects still exist Yehuda Not sure where to go from here, and our cluster is slowly filling up while not clearing any space. I did the last section: I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours, repeat last step, see that the parts don't appear there anymore - run rados -p
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On 29/11/14 01:50, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote: On 2014-11-28 15:42, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote: On 2014-11-27 11:36, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote: On 2014-11-27 10:21, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote: On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] Do you know which of the rados pools have all that extra data? Try to list that pool's objects, verify that there are no surprises there (e.g., use 'rados -p pool ls'). Yehuda I'm just running that command now, and its taking some time. There is a large number of objects. Once it has finished, what should I be looking for? I assume the pool in question is the one that holds your objects data? You should be looking for objects that are not expected to exist anymore, and objects of buckets that don't exist anymore. The problem here is to identify these. I suggest starting by looking at all the existing buckets, compose a list of all the bucket prefixes for the existing buckets, and try to look whether there are objects that have different prefixes. Yehuda Any ideas? I've found the prefix, the number of objects in the pool that match that prefix numbers in the 21 millions The actual 'radosgw-admin bucket stats' command reports it as only having 1.2 million. Well, the objects you're seeing are raw objects, and since rgw stripes the data, it is expected to have more raw objects than objects in the bucket. Still, it seems that you have much too many of these. You can try to check whether there are pending multipart uploads that were never completed using the S3 api. At the moment there's no easy way to figure out which raw objects are not supposed to exist. The process would be like this: 1. rados ls -p data pool keep the list sorted 2. list objects in the bucket 3. for each object in (2), do: radosgw-admin object stat --bucket=bucket --object=object --rgw-cache-enabled=false (disabling the cache so that it goes quicker) 4. look at the result of (3), and generate a list of all the parts. 5. sort result of (4), compare it to (1) Note that if you're running firefly or later, the raw objects are not specified explicitly in the command you run at (3), so you might need a different procedure, e.g., find out the raw objects random string that is being used, remove it from the list generated in 1, etc.) That's basically it. I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours, repeat last step, see that the parts don't appear there anymore - run rados -p pool ls, check to see if the raw objects still exist Yehuda Not sure where to go from here, and our cluster is slowly filling up while not clearing any space. I did the last section: I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours, repeat last step, see that the parts don't appear there anymore - run rados -p pool ls, check to see if the raw objects still exist I
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote: On 29/11/14 01:50, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote: On 2014-11-28 15:42, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote: On 2014-11-27 11:36, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote: On 2014-11-27 10:21, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote: On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] Do you know which of the rados pools have all that extra data? Try to list that pool's objects, verify that there are no surprises there (e.g., use 'rados -p pool ls'). Yehuda I'm just running that command now, and its taking some time. There is a large number of objects. Once it has finished, what should I be looking for? I assume the pool in question is the one that holds your objects data? You should be looking for objects that are not expected to exist anymore, and objects of buckets that don't exist anymore. The problem here is to identify these. I suggest starting by looking at all the existing buckets, compose a list of all the bucket prefixes for the existing buckets, and try to look whether there are objects that have different prefixes. Yehuda Any ideas? I've found the prefix, the number of objects in the pool that match that prefix numbers in the 21 millions The actual 'radosgw-admin bucket stats' command reports it as only having 1.2 million. Well, the objects you're seeing are raw objects, and since rgw stripes the data, it is expected to have more raw objects than objects in the bucket. Still, it seems that you have much too many of these. You can try to check whether there are pending multipart uploads that were never completed using the S3 api. At the moment there's no easy way to figure out which raw objects are not supposed to exist. The process would be like this: 1. rados ls -p data pool keep the list sorted 2. list objects in the bucket 3. for each object in (2), do: radosgw-admin object stat --bucket=bucket --object=object --rgw-cache-enabled=false (disabling the cache so that it goes quicker) 4. look at the result of (3), and generate a list of all the parts. 5. sort result of (4), compare it to (1) Note that if you're running firefly or later, the raw objects are not specified explicitly in the command you run at (3), so you might need a different procedure, e.g., find out the raw objects random string that is being used, remove it from the list generated in 1, etc.) That's basically it. I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours, repeat last step, see that the parts don't appear there anymore - run rados -p pool ls, check to see if the raw objects still exist Yehuda Not sure where to go from here, and our cluster is slowly filling up while not clearing any space. I did the last section: I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On 29/11/14 11:40, Yehuda Sadeh wrote: On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote: On 29/11/14 01:50, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote: On 2014-11-28 15:42, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote: On 2014-11-27 11:36, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote: On 2014-11-27 10:21, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote: On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] Do you know which of the rados pools have all that extra data? Try to list that pool's objects, verify that there are no surprises there (e.g., use 'rados -p pool ls'). Yehuda I'm just running that command now, and its taking some time. There is a large number of objects. Once it has finished, what should I be looking for? I assume the pool in question is the one that holds your objects data? You should be looking for objects that are not expected to exist anymore, and objects of buckets that don't exist anymore. The problem here is to identify these. I suggest starting by looking at all the existing buckets, compose a list of all the bucket prefixes for the existing buckets, and try to look whether there are objects that have different prefixes. Yehuda Any ideas? I've found the prefix, the number of objects in the pool that match that prefix numbers in the 21 millions The actual 'radosgw-admin bucket stats' command reports it as only having 1.2 million. Well, the objects you're seeing are raw objects, and since rgw stripes the data, it is expected to have more raw objects than objects in the bucket. Still, it seems that you have much too many of these. You can try to check whether there are pending multipart uploads that were never completed using the S3 api. At the moment there's no easy way to figure out which raw objects are not supposed to exist. The process would be like this: 1. rados ls -p data pool keep the list sorted 2. list objects in the bucket 3. for each object in (2), do: radosgw-admin object stat --bucket=bucket --object=object --rgw-cache-enabled=false (disabling the cache so that it goes quicker) 4. look at the result of (3), and generate a list of all the parts. 5. sort result of (4), compare it to (1) Note that if you're running firefly or later, the raw objects are not specified explicitly in the command you run at (3), so you might need a different procedure, e.g., find out the raw objects random string that is being used, remove it from the list generated in 1, etc.) That's basically it. I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours, repeat last step, see that the parts don't appear there anymore - run rados -p pool ls, check to see if the raw objects still exist Yehuda Not sure where to go from here, and our cluster is slowly filling up while not clearing any space. I did the last section: I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours, repeat last step, see
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote: On 2014-11-27 11:36, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote: On 2014-11-27 10:21, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote: On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] Do you know which of the rados pools have all that extra data? Try to list that pool's objects, verify that there are no surprises there (e.g., use 'rados -p pool ls'). Yehuda I'm just running that command now, and its taking some time. There is a large number of objects. Once it has finished, what should I be looking for? I assume the pool in question is the one that holds your objects data? You should be looking for objects that are not expected to exist anymore, and objects of buckets that don't exist anymore. The problem here is to identify these. I suggest starting by looking at all the existing buckets, compose a list of all the bucket prefixes for the existing buckets, and try to look whether there are objects that have different prefixes. Yehuda Any ideas? I've found the prefix, the number of objects in the pool that match that prefix numbers in the 21 millions The actual 'radosgw-admin bucket stats' command reports it as only having 1.2 million. Well, the objects you're seeing are raw objects, and since rgw stripes the data, it is expected to have more raw objects than objects in the bucket. Still, it seems that you have much too many of these. You can try to check whether there are pending multipart uploads that were never completed using the S3 api. At the moment there's no easy way to figure out which raw objects are not supposed to exist. The process would be like this: 1. rados ls -p data pool keep the list sorted 2. list objects in the bucket 3. for each object in (2), do: radosgw-admin object stat --bucket=bucket --object=object --rgw-cache-enabled=false (disabling the cache so that it goes quicker) 4. look at the result of (3), and generate a list of all the parts. 5. sort result of (4), compare it to (1) Note that if you're running firefly or later, the raw objects are not specified explicitly in the command you run at (3), so you might need a different procedure, e.g., find out the raw objects random string that is being used, remove it from the list generated in 1, etc.) That's basically it. I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours, repeat last step, see that the parts don't appear there anymore - run rados -p pool ls, check to see if the raw objects still exist Yehuda Not sure where to go from here, and our cluster is slowly filling up while not clearing any space. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On 2014-11-28 15:42, Yehuda Sadeh wrote: On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote: On 2014-11-27 11:36, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote: On 2014-11-27 10:21, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote: On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] Do you know which of the rados pools have all that extra data? Try to list that pool's objects, verify that there are no surprises there (e.g., use 'rados -p pool ls'). Yehuda I'm just running that command now, and its taking some time. There is a large number of objects. Once it has finished, what should I be looking for? I assume the pool in question is the one that holds your objects data? You should be looking for objects that are not expected to exist anymore, and objects of buckets that don't exist anymore. The problem here is to identify these. I suggest starting by looking at all the existing buckets, compose a list of all the bucket prefixes for the existing buckets, and try to look whether there are objects that have different prefixes. Yehuda Any ideas? I've found the prefix, the number of objects in the pool that match that prefix numbers in the 21 millions The actual 'radosgw-admin bucket stats' command reports it as only having 1.2 million. Well, the objects you're seeing are raw objects, and since rgw stripes the data, it is expected to have more raw objects than objects in the bucket. Still, it seems that you have much too many of these. You can try to check whether there are pending multipart uploads that were never completed using the S3 api. At the moment there's no easy way to figure out which raw objects are not supposed to exist. The process would be like this: 1. rados ls -p data pool keep the list sorted 2. list objects in the bucket 3. for each object in (2), do: radosgw-admin object stat --bucket=bucket --object=object --rgw-cache-enabled=false (disabling the cache so that it goes quicker) 4. look at the result of (3), and generate a list of all the parts. 5. sort result of (4), compare it to (1) Note that if you're running firefly or later, the raw objects are not specified explicitly in the command you run at (3), so you might need a different procedure, e.g., find out the raw objects random string that is being used, remove it from the list generated in 1, etc.) That's basically it. I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours, repeat last step, see that the parts don't appear there anymore - run rados -p pool ls, check to see if the raw objects still exist Yehuda Not sure where to go from here, and our cluster is slowly filling up while not clearing any space. I did the last section: I'll be interested to figure out what happened, why the garbage collection didn't work correctly. You could try verifying that it's working by: - create an object (let's say ~10MB in size). - radosgw-admin object stat --bucket=bucket --object=object (keep this info, see - remove the object - run radosgw-admin gc list --include-all and verify that the raw parts are listed there - wait a few hours, repeat last step, see that the parts don't appear there anymore - run rados -p pool ls, check to see if the raw objects still exist I added the file, did a stat and it displayed the json output I removed the object and
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote: On 2014-11-27 10:21, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote: On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] Do you know which of the rados pools have all that extra data? Try to list that pool's objects, verify that there are no surprises there (e.g., use 'rados -p pool ls'). Yehuda I'm just running that command now, and its taking some time. There is a large number of objects. Once it has finished, what should I be looking for? I assume the pool in question is the one that holds your objects data? You should be looking for objects that are not expected to exist anymore, and objects of buckets that don't exist anymore. The problem here is to identify these. I suggest starting by looking at all the existing buckets, compose a list of all the bucket prefixes for the existing buckets, and try to look whether there are objects that have different prefixes. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On 2014-11-27 11:36, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote: On 2014-11-27 10:21, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote: On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] Do you know which of the rados pools have all that extra data? Try to list that pool's objects, verify that there are no surprises there (e.g., use 'rados -p pool ls'). Yehuda I'm just running that command now, and its taking some time. There is a large number of objects. Once it has finished, what should I be looking for? I assume the pool in question is the one that holds your objects data? You should be looking for objects that are not expected to exist anymore, and objects of buckets that don't exist anymore. The problem here is to identify these. I suggest starting by looking at all the existing buckets, compose a list of all the bucket prefixes for the existing buckets, and try to look whether there are objects that have different prefixes. Yehuda How do I get a list of bucket prefixes? I generated the list of objects in the .rgw.buckets pool and it has over 32million objects in it. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage
On 2014-11-27 11:36, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote: On 2014-11-27 10:21, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote: On 2014-11-27 09:38, Yehuda Sadeh wrote: On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote: I've been deleting a bucket which originally had 60TB of data in it, with our cluster doing only 1 replication, the total usage was 120TB. I've been deleting the objects slowly using S3 browser, and I can see the bucket usage is now down to around 2.5TB or 5TB with duplication, but the usage in the cluster has not changed. I've looked at garbage collection (radosgw-admin gc list --include all) and it just reports square brackets [] I've run radosgw-admin temp remove --date=2014-11-20, and it doesn't appear to have any effect. Is there a way to check where this space is being consumed? Running 'ceph df' the USED space in the buckets pool is not showing any of the 57TB that should have been freed up from the deletion so far. Running 'radosgw-admin bucket stats | jshon | grep size_kb_actual' and adding up all the buckets usage, this shows that the space has been freed from the bucket, but the cluster is all sorts of messed up. ANY IDEAS? What can I look at? Can you run 'radosgw-admin gc list --include-all'? Yehuda I've done it before, and it just returns square brackets [] (see below) radosgw-admin gc list --include-all [] Do you know which of the rados pools have all that extra data? Try to list that pool's objects, verify that there are no surprises there (e.g., use 'rados -p pool ls'). Yehuda I'm just running that command now, and its taking some time. There is a large number of objects. Once it has finished, what should I be looking for? I assume the pool in question is the one that holds your objects data? You should be looking for objects that are not expected to exist anymore, and objects of buckets that don't exist anymore. The problem here is to identify these. I suggest starting by looking at all the existing buckets, compose a list of all the bucket prefixes for the existing buckets, and try to look whether there are objects that have different prefixes. Yehuda I've found the bucket prefix by doing 'radosgw-admin bucket stats --bucket=bucketname' When going through the list of objects with the prefix, it reports 21million objects. But the bucket stats query only reports 1.2million objects in the bucket ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com