[ceph-users] Rocksdb as omap db backend on jewel 10.2.10

2018-02-15 Thread Sam Wouters
Hi,

does anyone know if it rocksdb is ready for use as omap db on jewel?

According to the release notes of RH Ceph 2.4, "RocksDB is enabled as an
option to replace levelDB" and they even have a solution on how to
convert the leveldb omap db to rocksdb
(https://access.redhat.com/solutions/3210951) which mentions you need at
least 10.2.7.

However, to be able to use it on upstream ceph 10.2.10, you still need
to set the "enable experimental unrecoverable data corrupting features =
rocksdb" setting in ceph.conf, and on every "ceph health" or other
command you get the warning "the following dangerous and experimental
features are enabled: rocksdb".

We have a lot of performance issues with due to very large omap db's and
would love to find out if switching to rocksdb would help there.
Anyone any experience with this (good or bad)?

regards,
Sam

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous, RGW bucket resharding

2017-12-11 Thread Sam Wouters
On 11-12-17 16:23, Orit Wasserman wrote:
> On Mon, Dec 11, 2017 at 4:58 PM, Sam Wouters <s...@ericom.be> wrote:
>> Hi Orrit,
>>
>>
>> On 04-12-17 18:57, Orit Wasserman wrote:
>>> Hi Andreas,
>>>
>>> On Mon, Dec 4, 2017 at 11:26 AM, Andreas Calminder
>>> <andreas.calmin...@klarna.com> wrote:
>>>> Hello,
>>>> With release 12.2.2 dynamic resharding bucket index has been disabled
>>>> when running a multisite environment
>>>> (http://tracker.ceph.com/issues/21725). Does this mean that resharding
>>>> of bucket indexes shouldn't be done at all, manually, while running
>>>> multisite as there's a risk of corruption?
>>>>
>>> You will need to stop the sync on the bucket before doing the
>>> resharding and start it again after the resharding completes.
>>> It will start a full sync on the bucket (it doesn't mean we copy the
>>> objects but we go over on all of them to check if the need to be
>>> synced).
>>> We will automate this as part of the reshard admin command in the next
>>> Luminous release.
>> Does this also apply to Jewel? Stop sync and restart after resharding.
>> (I don't know if there is any way to disable sync for a specific bucket.)
>>
> In Jewel we only support offline bucket resharding, you have to stop
> both zones gateways before resharding.
> Do:
> Execute the resharding radosgw-admin command.
> Run full sync on the bucket using: radosgw-admin bucket sync init on the 
> bucket.
> Start the gateways.
>
> This should work but I have not tried it ...
> Regards,
> Orit
Is it necessary to really stop the gateways? We tend to block all
traffic to the bucket being resharded with the use of ACLs in the
haproxy in front, to avoid downtime for non related buckets.

Would a:

- restart gws with sync thread disabled
- block traffic to bucket
- reshard
- unblock traffic
- bucket sync init
- restart gws with sync enabled

work as well?

r,
Sam

>> r,
>> Sam
>>>> Also, as dynamic bucket resharding was/is the main motivator moving to
>>>> Luminous (for me at least) is dynamic reshardning something that is
>>>> planned to be fixed for multisite environments later in the Luminous
>>>> life-cycle or will it be left disabled forever?
>>>>
>>> We are planning to enable it in Luminous time.
>>>
>>> Regards,
>>> Orit
>>>
>>>> Thanks!
>>>> /andreas
>>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous, RGW bucket resharding

2017-12-11 Thread Sam Wouters
Hi Orrit,


On 04-12-17 18:57, Orit Wasserman wrote:
> Hi Andreas,
>
> On Mon, Dec 4, 2017 at 11:26 AM, Andreas Calminder
>  wrote:
>> Hello,
>> With release 12.2.2 dynamic resharding bucket index has been disabled
>> when running a multisite environment
>> (http://tracker.ceph.com/issues/21725). Does this mean that resharding
>> of bucket indexes shouldn't be done at all, manually, while running
>> multisite as there's a risk of corruption?
>>
> You will need to stop the sync on the bucket before doing the
> resharding and start it again after the resharding completes.
> It will start a full sync on the bucket (it doesn't mean we copy the
> objects but we go over on all of them to check if the need to be
> synced).
> We will automate this as part of the reshard admin command in the next
> Luminous release.
Does this also apply to Jewel? Stop sync and restart after resharding.
(I don't know if there is any way to disable sync for a specific bucket.)

r,
Sam
>> Also, as dynamic bucket resharding was/is the main motivator moving to
>> Luminous (for me at least) is dynamic reshardning something that is
>> planned to be fixed for multisite environments later in the Luminous
>> life-cycle or will it be left disabled forever?
>>
> We are planning to enable it in Luminous time.
>
> Regards,
> Orit
>
>> Thanks!
>> /andreas
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] jewel - radosgw-admin bucket limit check broken?

2017-08-08 Thread Sam Wouters
Hi,

I wanted to the test the new feature to check the present buckets for
optimal index sharding.
According to the docs this should be as simple as "radosgw-admin -n
client.xxx bucket limit check" with an optional param for printing only
buckets over or nearing the limit.

When I invoke this, however I get the simple error output

unrecognized arg limit
usage: radosgw-admin  [options...]

followed by help output.

Tested this with 10.2.8 and 10.2.9; other radosgw-admin commands work fine.
I've looked into the open issues but don't seem to find this in the tracker.

Simple bug or am I completely missing something?

r,
Sam

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Recovering rgw index pool with large omap size

2017-06-20 Thread Sam Wouters
Hi list,

we need to recover an index pool distributed over 4 ssd based osd's. We
needed to kick out one of the OSDs cause it was blocking all rgw access
due to leveldb compacting. Since then we've restarted the OSD with
"leveldb compact on mount = true" and noup flag set, running the leveldb
compact offline, but the index pg's are now running in degraded mode.

Goal is to make the recovery as fast as possible during a small
maintenance window and/or with minimal client impact.

Cluster is running jewel 10.2.7 (recently upgraded from hammer) and has
ongoing backfill operations (from changing the tunables).
We have some buckets with a large amount of objects in it. Bucket index
re-sharding would be needed, but we don't have the opportunity to do
that right now.

Plan so far:
* set global I/O scheduling priority to 7 (lowest)
* set index pool osd's specifics:
- set recovery prio to highest (63)
- set client prio to lowest (1)
- increase recovery threads to 2
- set disk thread prio to highest (0)
- limit omap entries per chunk for recovery to 32k (64k seems to give
timeouts)
* unset noup flag to let the misbehaving OSD kick in and start recovery

Any further ideas, experience or remarks would be very much appreciated...

r,
Sam

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Prioritise recovery on specific PGs/OSDs?

2017-06-20 Thread Sam Wouters
Yes, don't know exactly since which release it was introduced, but in
latest jewel and beyond there is:


Please use pool level options recovery_priority and recovery_op_priority
for enabling pool level recovery priority feature:
Raw
# ceph osd pool set default.rgw.buckets.index recovery_priority 5
# ceph osd pool set default.rgw.buckets.index recovery_op_priority 5
Recovery value 5 will help because the default is 3 in jewel release,
use below command to check if both options are set properly
 Is there a way to prioritize specific pools during recovery?  I know
> there are issues open for it, but I wasn't aware it was implemented yet...
>
> Regards,
> Logan
>
> - On Jun 20, 2017, at 8:20 AM, Sam Wouters <s...@ericom.be> wrote:
>
> Hi,
>
> Are they all in the same pool? Otherwise you could prioritize pool
> recovery.
> If not, maybe you can play with the osd max backfills number, no
> idea if it accepts a value of 0 to actually disable it for
> specific OSDs.
>
> r,
> Sam
>
> On 20-06-17 14:44, Richard Hesketh wrote:
>
> Is there a way, either by individual PG or by OSD, I can prioritise 
> backfill/recovery on a set of PGs which are currently particularly important 
> to me?
>
> For context, I am replacing disks in a 5-node Jewel cluster, on a 
> node-by-node basis - mark out the OSDs on a node, wait for them to clear, 
> replace OSDs, bring up and in, mark out the OSDs on the next set, etc. I've 
> done my first node, but the significant CRUSH map changes means most of my 
> data is moving. I only currently care about the PGs on my next set of OSDs to 
> replace - the other remapped PGs I don't care about settling because they're 
> only going to end up moving around again after I do the next set of disks. I 
> do want the PGs specifically on the OSDs I am about to replace to backfill 
> because I don't want to compromise data integrity by downing them while they 
> host active PGs. If I could specifically prioritise the backfill on those 
> PGs/OSDs, I could get on with replacing disks without worrying about causing 
> degraded PGs.
>
> I'm in a situation right now where there is merely a couple of dozen 
> PGs on the disks I want to replace, which are all remapped and waiting to 
> backfill - but there are 2200 other PGs also waiting to backfill because 
> they've moved around too, and it's extremely frustating to be sat waiting to 
> see when the ones I care about will finally be handled so I can get on with 
> replacing those disks.
>
> Rich
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Prioritise recovery on specific PGs/OSDs?

2017-06-20 Thread Sam Wouters
Hi,

Are they all in the same pool? Otherwise you could prioritize pool recovery.
If not, maybe you can play with the osd max backfills number, no idea if
it accepts a value of 0 to actually disable it for specific OSDs.

r,
Sam

On 20-06-17 14:44, Richard Hesketh wrote:
> Is there a way, either by individual PG or by OSD, I can prioritise 
> backfill/recovery on a set of PGs which are currently particularly important 
> to me?
>
> For context, I am replacing disks in a 5-node Jewel cluster, on a 
> node-by-node basis - mark out the OSDs on a node, wait for them to clear, 
> replace OSDs, bring up and in, mark out the OSDs on the next set, etc. I've 
> done my first node, but the significant CRUSH map changes means most of my 
> data is moving. I only currently care about the PGs on my next set of OSDs to 
> replace - the other remapped PGs I don't care about settling because they're 
> only going to end up moving around again after I do the next set of disks. I 
> do want the PGs specifically on the OSDs I am about to replace to backfill 
> because I don't want to compromise data integrity by downing them while they 
> host active PGs. If I could specifically prioritise the backfill on those 
> PGs/OSDs, I could get on with replacing disks without worrying about causing 
> degraded PGs.
>
> I'm in a situation right now where there is merely a couple of dozen PGs on 
> the disks I want to replace, which are all remapped and waiting to backfill - 
> but there are 2200 other PGs also waiting to backfill because they've moved 
> around too, and it's extremely frustating to be sat waiting to see when the 
> ones I care about will finally be handled so I can get on with replacing 
> those disks.
>
> Rich
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jewel - rgw blocked on deep-scrub of bucket index pg

2017-05-06 Thread Sam Wouters
Hi,


On 06-05-17 20:08, Wido den Hollander wrote:
>> Op 6 mei 2017 om 9:55 schreef Christian Balzer <ch...@gol.com>:
>>
>>
>>
>> Hello,
>>
>> On Sat, 6 May 2017 09:25:15 +0200 (CEST) Wido den Hollander wrote:
>>
>>>> Op 5 mei 2017 om 10:33 schreef Sam Wouters <s...@ericom.be>:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> we have a small cluster running on jewel 10.2.7; NL-SAS disks only, osd
>>>> data and journal co located on the disks; main purpose rgw secondary zone.
>>>>
>>>> Since the upgrade to jewel, whenever a deep scrub starts on one of the
>>>> rgw index pool pg's, slow requests start piling up and rgw requests are
>>>> blocked after some hours.
>>>> The deep-scrub doesn't seem to finish (still running after +11 hours)
>>>> and only escape I found so far is a restart of the primary osd holding
>>>> the pg.
>>>>
>>>> Maybe important to know, we have some large rgw buckets regarding
>>>> #objects (+ 3 million) with only index sharding of 8.
>>>>
>>>> scrub related settings:
>>>> osd scrub sleep = 0.1  
>>> Try removing this line, it can block threads under Jewel.
I also found the bug report (#19497) yesterday, so indeed removed the
sleep and manually started the deep-scrub. I didn't had time to check
the result until now.
After almost 26 hours the deep-scrub operation finished (2017-05-05
10:57:08 -> 2017-05-06 12:29:05), however during the scrubbing frequent
timeouts and complete rgw downtime for various periods of time occurred.

Our primary cluster is still running hammer, and on there the index
pools are on ssd's, but this still raises concerns for after the planned
upgrade of that one...

Thanks a lot for the help!

r,
Sam
>>>
>> I'd really REALLY wish that would get fixed properly, as in the original
>> functionality restored. 
> Afaik new work is being done on this. There was a recent thread on the 
> ceph-users or devel (can't find it) that new code is out there to fix this.
>
> Wido
>
>> Because as we've learned entrusting everything into internal Ceph queues
>> with priorities isn't working as expected in all cases.
>>
>> For a second, very distant option, turn it into a NOP for the time being.
>> As it stands now, it's another self-made, Jewel introduced bug...
>>
>> Christian
>>
>>> See how that works out.
>>>
>>> Wido
>>>
>>>> osd scrub during recovery = False
>>>> osd scrub priority = 1
>>>> osd deep scrub stride = 1048576
>>>> osd scrub chunk min = 1
>>>> osd scrub chunk max = 1
>>>>
>>>> Any help on debugging / resolving would be very much appreciated...
>>>>
>>>> regards,
>>>> Sam
>>>>
>>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com  
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>> -- 
>> Christian BalzerNetwork/Systems Engineer
>> ch...@gol.comGlobal OnLine Japan/Rakuten Communications
>> http://www.gol.com/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] jewel - rgw blocked on deep-scrub of bucket index pg

2017-05-05 Thread Sam Wouters
Hi,

we have a small cluster running on jewel 10.2.7; NL-SAS disks only, osd
data and journal co located on the disks; main purpose rgw secondary zone.

Since the upgrade to jewel, whenever a deep scrub starts on one of the
rgw index pool pg's, slow requests start piling up and rgw requests are
blocked after some hours.
The deep-scrub doesn't seem to finish (still running after +11 hours)
and only escape I found so far is a restart of the primary osd holding
the pg.

Maybe important to know, we have some large rgw buckets regarding
#objects (+ 3 million) with only index sharding of 8.

scrub related settings:
osd scrub sleep = 0.1
osd scrub during recovery = False
osd scrub priority = 1
osd deep scrub stride = 1048576
osd scrub chunk min = 1
osd scrub chunk max = 1

Any help on debugging / resolving would be very much appreciated...

regards,
Sam

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Antw: Re: Best practices for extending a ceph cluster with minimal client impact data movement

2016-09-05 Thread Sam Wouters
Hi,


>>> Now, add the OSDs to the cluster, but NOT to the CRUSHMap.
>>>
>>> When all the OSDs are online, inject a new CRUSHMap where you add the new 
>>> OSDs to the data placement.
>>>
>>> $ ceph osd setcrushmap -i 
>>>
>>> The OSDs will now start to migrate data, but this is throttled by the max 
>>> recovery and backfill settings.


I was wondering how exactly you accomplish that?
Can you do this with a "ceph-deploy create" with "noin" or "noup" flags
set, or does one need to follow the manual steps of adding an osd?

r,
Sam
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can Jewel read Hammer radosgw buckets?

2016-04-24 Thread Sam Wouters
On 23-04-16 18:17, Yehuda Sadeh-Weinraub wrote:
> On Sat, Apr 23, 2016 at 6:22 AM, Richard Chan
>  wrote:
>> Hi Cephers,
>>
>> I upgraded to Jewel and noted the is massive radosgw multisite rework
>> in the release notes.
>>
>> Can Jewel radosgw be configured to present existing Hammer buckets?
>> On  a test system, jewel didn't recognise my Hammer buckets;
>>
>> Hammer used pools .rgw.*
>> Jewel created by default: .rgw.root and default.rgw*
>>
>>
>>
> Yes, jewel should be able to read hammer buckets. If it detects that
> there's an old config, it should migrate existing setup into the new
> config. It seemsthat something didn't work as expected here. One way
> to fix it would be to create a new zone and set its pools to point at
> the old config's pools. We'll need to figure out what went wrong
> though.
>
Hi,

I'm also wandering about the correct upgrade procedure for the
radosgw's, especially in a multi gateway setup in a federated config.

If you say existing setup should migrate, is it ok then to have hammer
and jewel radosgw's co-exist (for a short time)? We have for example
multi radosgw instances behind an haproxy. Can they be upgraded one at a
time or do they all need to be stopped before starting the first jewel
radosgw?

Does the ceph.conf file needs to be adapted to the jewel config, fe
change "rgw region root pool" into "rgw zonegroup root pool"? Before or
after the upgrade?

Concerning data replication. I understand the radosgw-agent is
deprecated in jewel and the replication is done by the radosgw's them
selves. Is this also automatically enabled or does this need to be
started / configured somehow?

thanks in advance,
Sam

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubleshooting rgw bucket list

2015-09-02 Thread Sam Wouters

Thanks!
Playing around with max_keys in bucket listing retrieval actually gives 
me results or not, this gives me a way to list the content until the bug 
is fixed.
Is it possible somehow to copy the objects to a new bucket (with 
versioning disabled), and rename the current one? I don't think the 
latter is possible through the api but maybe there is some hidden way ;-)


Could you also take a minute to confirm another versioning related bug I 
posted: http://tracker.ceph.com/issues/12819
If you could give me some pointers to contribute, I don't mind digging 
into code, I will gladly do so.


r,
Sam

On 01-09-15 22:37, Yehuda Sadeh-Weinraub wrote:

Yeah, I'm able to reproduce the issue. It is related to the fact that
you have a bunch of delete markers in the bucket, as it triggers some
bug there. I opened a new ceph issue for this one:

http://tracker.ceph.com/issues/12913

Thanks,
Yehuda

On Tue, Sep 1, 2015 at 11:39 AM, Sam Wouters <s...@ericom.be> wrote:

Sorry, forgot to mention:

- yes, filtered by thread
- the "is not valid" line occurred when performing the bucket --check
- when doing a bucket listing, I also get an "is not valid", but on a
different object:
7fe4f1d5b700 20  cls/rgw/cls_rgw.cc:460: entry
abc_econtract/data/6scbrrlo4vttk72melewizj6n3[] is not valid

bilog entry for this object similar to the one below

r, Sam

On 01-09-15 20:30, Sam Wouters wrote:

Hi,

see inline

On 01-09-15 20:14, Yehuda Sadeh-Weinraub wrote:

I assume you filtered the log by thread? I don't see the response
messages. For the bucket check you can run radosgw-admin with
--log-to-stderr.

nothing is logged to the console when I do that

Can you also set 'debug objclass = 20' on the osds? You can do it by:

$ ceph tell osd.\* injectargs --debug-objclass 20

this continuously prints "20  cls/rgw/cls_rgw.cc:460: entry
abc_econtract/data/6smuz2ysavvxbygng34tgusyse[] is not valid" on osd.0

Also, it'd be interesting to get the following:

$ radosgw-admin bi list --bucket=
--object=abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5

this gives me an empty array:
[
]
but we did a trim of the bilog a while ago cause a lot entries regarding
objects that were already removed from the bucket kept on syncing with
the sync agent, causing a lot of delete_markers at the replication site.

The object in the error above from the osd log, gives the following:
# radosgw-admin --log-to-stderr -n client.radosgw.be-east-1 bi list
--bucket=aws-cmis-prod
--object=abc_econtract/data/6smuz2ysavvxbygng34tgusyse
[
 {
 "type": "plain",
 "idx": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse",
 "entry": {
 "name": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse",
 "instance": "",
 "ver": {
 "pool": -1,
 "epoch": 0
 },
 "locator": "",
 "exists": "false",
 "meta": {
 "category": 0,
 "size": 0,
 "mtime": "0.00",
 "etag": "",
 "owner": "",
 "owner_display_name": "",
 "content_type": "",
 "accounted_size": 0
 },
 "tag": "",
 "flags": 8,
 "pending_map": [],
 "versioned_epoch": 0
 }
 },
 {
 "type": "plain",
 "idx":
"abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse\uv913\uiRQZUR76UdeymR-PGaw6sbCHMCOcaovu",
 "entry": {
 "name": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse",
 "instance": "RQZUR76UdeymR-PGaw6sbCHMCOcaovu",
 "ver": {
 "pool": 23,
 "epoch": 9680
 },
 "locator": "",
 "exists": "true",
 "meta": {
 "category": 1,
 "size": 103410,
 "mtime": "2015-08-07 17:57:32.00Z",
 "etag": "6c67f5e6cb4aa63f4fa26a3b94d19d3a",
 "owner": "aws-cmis-prod",
 "owner_display_name": "AWS-CMIS prod user",
 "content_type": "application\/pdf",
 "accounted_size": 103410
 },
 "tag": "be-east.34319.452037

Re: [ceph-users] Troubleshooting rgw bucket list

2015-09-01 Thread Sam Wouters
Hi,

see inline

On 01-09-15 20:14, Yehuda Sadeh-Weinraub wrote:
> I assume you filtered the log by thread? I don't see the response
> messages. For the bucket check you can run radosgw-admin with
> --log-to-stderr.
nothing is logged to the console when I do that
>
> Can you also set 'debug objclass = 20' on the osds? You can do it by:
>
> $ ceph tell osd.\* injectargs --debug-objclass 20
this continuously prints "20  cls/rgw/cls_rgw.cc:460: entry
abc_econtract/data/6smuz2ysavvxbygng34tgusyse[] is not valid" on osd.0
>
> Also, it'd be interesting to get the following:
>
> $ radosgw-admin bi list --bucket=
> --object=abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5
this gives me an empty array:
[
]
but we did a trim of the bilog a while ago cause a lot entries regarding
objects that were already removed from the bucket kept on syncing with
the sync agent, causing a lot of delete_markers at the replication site.

The object in the error above from the osd log, gives the following:
# radosgw-admin --log-to-stderr -n client.radosgw.be-east-1 bi list
--bucket=aws-cmis-prod
--object=abc_econtract/data/6smuz2ysavvxbygng34tgusyse
[
{
"type": "plain",
"idx": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse",
"entry": {
"name": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse",
"instance": "",
"ver": {
"pool": -1,
"epoch": 0
},
"locator": "",
"exists": "false",
"meta": {
"category": 0,
"size": 0,
"mtime": "0.00",
"etag": "",
"owner": "",
"owner_display_name": "",
"content_type": "",
"accounted_size": 0
},
"tag": "",
"flags": 8,
"pending_map": [],
"versioned_epoch": 0
}
},
{
"type": "plain",
"idx":
"abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse\uv913\uiRQZUR76UdeymR-PGaw6sbCHMCOcaovu",
"entry": {
"name": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse",
"instance": "RQZUR76UdeymR-PGaw6sbCHMCOcaovu",
"ver": {
"pool": 23,
"epoch": 9680
},
"locator": "",
"exists": "true",
"meta": {
"category": 1,
"size": 103410,
"mtime": "2015-08-07 17:57:32.00Z",
"etag": "6c67f5e6cb4aa63f4fa26a3b94d19d3a",
"owner": "aws-cmis-prod",
"owner_display_name": "AWS-CMIS prod user",
"content_type": "application\/pdf",
"accounted_size": 103410
},
"tag": "be-east.34319.4520377",
"flags": 3,
"pending_map": [],
"versioned_epoch": 2
}
},
{
"type": "instance",
"idx":
"�1000_abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse\uiRQZUR76UdeymR-PGaw6sbCHMCOcaovu",
"entry": {
"name": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse",
"instance": "RQZUR76UdeymR-PGaw6sbCHMCOcaovu",
"ver": {
"pool": 23,
"epoch": 9680
},
"locator": "",
"exists": "true",
"meta": {
"category": 1,
"size": 103410,
"mtime": "2015-08-07 17:57:32.00Z",
"etag": "6c67f5e6cb4aa63f4fa26a3b94d19d3a",
        "owner": "aws-cmis-prod",
"owner_display_name": "AWS-CMIS prod user",
"content_type": "application\/pdf",
"accounted_size": 103410
},
"tag": "be-east.34319.4520377",
"flags": 3,
"pendi

Re: [ceph-users] Troubleshooting rgw bucket list

2015-09-01 Thread Sam Wouters
not sure where I can find the logs for the bucket check, I can't really
filter them out in the radosgw log.

-Sam

On 01-09-15 19:25, Sam Wouters wrote:
> It looks like it, this is what shows in the logs after bumping the debug
> and requesting a bucket list.
>
> 2015-09-01 17:14:53.008620 7fccb17ca700 10 cls_bucket_list
> aws-cmis-prod(@{i=.be-east.rgw.buckets.index}.be-east.rgw.buckets[be-east.5436.1])
> start
> abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5[]
> num_entries 1
> 2015-09-01 17:14:53.008629 7fccb17ca700 20 reading from
> .be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
> 2015-09-01 17:14:53.008636 7fccb17ca700 20 get_obj_state:
> rctx=0x7fccb17c84d0
> obj=.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
> state=0x7fcde01a4060 s->prefetch_data=0
> 2015-09-01 17:14:53.008640 7fccb17ca700 10 cache get:
> name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
> 2015-09-01 17:14:53.008645 7fccb17ca700 20 get_obj_state: s->obj_tag was
> set empty
> 2015-09-01 17:14:53.008647 7fccb17ca700 10 cache get:
> name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
> 2015-09-01 17:14:53.008675 7fccb17ca700  1 -- 10.11.4.105:0/1109243 -->
> 10.11.4.105:6801/39085 -- osd_op(client.55506.0:435874
> ...
> .dir.be-east.5436.1 [call rgw.bucket_list] 26.7d78fc84
> ack+read+known_if_redirected e255) v5 -- ?+0 0x7fcde01a0540 con 0x3a2d870
>
> On 01-09-15 17:11, Yehuda Sadeh-Weinraub wrote:
>> Can you bump up debug (debug rgw = 20, debug ms = 1), and see if the
>> operations (bucket listing and bucket check) go into some kind of
>> infinite loop?
>>
>> Yehuda
>>
>> On Tue, Sep 1, 2015 at 1:16 AM, Sam Wouters <s...@ericom.be> wrote:
>>> Hi, I've started the bucket --check --fix on friday evening and it's
>>> still running. 'ceph -s' shows the cluster health as OK, I don't know if
>>> there is anything else I could check? Is there a way of finding out if
>>> its actually doing something?
>>>
>>> We only have this issue on the one bucket with versioning enabled, I
>>> can't get rid of the feeling it has something todo with that. The
>>> "underscore bug" is also still present on that bucket
>>> (http://tracker.ceph.com/issues/12819). Not sure if thats related in any
>>> way.
>>> Are there any alternatives, as for example copy all the objects into a
>>> new bucket without versioning? Simple way would be to list the objects
>>> and copy them to a new bucket, but bucket listing is not working so...
>>>
>>> -Sam
>>>
>>>
>>> On 31-08-15 10:47, Gregory Farnum wrote:
>>>> This generally shouldn't be a problem at your bucket sizes. Have you
>>>> checked that the cluster is actually in a healthy state? The sleeping
>>>> locks are normal but should be getting woken up; if they aren't it
>>>> means the object access isn't working for some reason. A down PG or
>>>> something would be the simplest explanation.
>>>> -Greg
>>>>
>>>> On Fri, Aug 28, 2015 at 6:52 PM, Sam Wouters <s...@ericom.be> wrote:
>>>>> Ok, maybe I'm to impatient. It would be great if there were some verbose
>>>>> or progress logging of the radosgw-admin tool.
>>>>> I will start a check and let it run over the weekend.
>>>>>
>>>>> tnx,
>>>>> Sam
>>>>>
>>>>> On 28-08-15 18:16, Sam Wouters wrote:
>>>>>> Hi,
>>>>>>
>>>>>> this bucket only has 13389 objects, so the index size shouldn't be a
>>>>>> problem. Also, on the same cluster we have an other bucket with 1200543
>>>>>> objects (but no versioning configured), which has no issues.
>>>>>>
>>>>>> when we run a radosgw-admin bucket --check (--fix), nothing seems to be
>>>>>> happening. Putting an strace on the process shows a lot of lines like 
>>>>>> these:
>>>>>> [pid 99372] futex(0x2d730d4, FUTEX_WAIT_PRIVATE, 156619, NULL
>>>>>> 
>>>>>> [pid 99385] futex(0x2da9410, FUTEX_WAIT_PRIVATE, 2, NULL 
>>>>>> [pid 99371] futex(0x2da9410, FUTEX_WAKE_PRIVATE, 1 
>>>>>> [pid 99385] <... futex resumed> )   = -1 EAGAIN (Resource
>>>>>> temporarily unavailable)
>>>>>> [pid 99371] <... futex resumed> )   = 0
>>>>>>
>>>>>> but no errors in the ceph logs or health warnings.
>>>>>>

Re: [ceph-users] Troubleshooting rgw bucket list

2015-09-01 Thread Sam Wouters
Sorry, forgot to mention:

- yes, filtered by thread
- the "is not valid" line occurred when performing the bucket --check
- when doing a bucket listing, I also get an "is not valid", but on a
different object:
7fe4f1d5b700 20  cls/rgw/cls_rgw.cc:460: entry
abc_econtract/data/6scbrrlo4vttk72melewizj6n3[] is not valid

bilog entry for this object similar to the one below

r, Sam

On 01-09-15 20:30, Sam Wouters wrote:
> Hi,
>
> see inline
>
> On 01-09-15 20:14, Yehuda Sadeh-Weinraub wrote:
>> I assume you filtered the log by thread? I don't see the response
>> messages. For the bucket check you can run radosgw-admin with
>> --log-to-stderr.
> nothing is logged to the console when I do that
>> Can you also set 'debug objclass = 20' on the osds? You can do it by:
>>
>> $ ceph tell osd.\* injectargs --debug-objclass 20
> this continuously prints "20  cls/rgw/cls_rgw.cc:460: entry
> abc_econtract/data/6smuz2ysavvxbygng34tgusyse[] is not valid" on osd.0
>> Also, it'd be interesting to get the following:
>>
>> $ radosgw-admin bi list --bucket=
>> --object=abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5
> this gives me an empty array:
> [
> ]
> but we did a trim of the bilog a while ago cause a lot entries regarding
> objects that were already removed from the bucket kept on syncing with
> the sync agent, causing a lot of delete_markers at the replication site.
>
> The object in the error above from the osd log, gives the following:
> # radosgw-admin --log-to-stderr -n client.radosgw.be-east-1 bi list
> --bucket=aws-cmis-prod
> --object=abc_econtract/data/6smuz2ysavvxbygng34tgusyse
> [
> {
> "type": "plain",
> "idx": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse",
> "entry": {
> "name": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse",
> "instance": "",
> "ver": {
> "pool": -1,
> "epoch": 0
> },
> "locator": "",
> "exists": "false",
> "meta": {
> "category": 0,
> "size": 0,
> "mtime": "0.00",
> "etag": "",
> "owner": "",
> "owner_display_name": "",
> "content_type": "",
> "accounted_size": 0
> },
> "tag": "",
> "flags": 8,
> "pending_map": [],
> "versioned_epoch": 0
> }
> },
> {
> "type": "plain",
> "idx":
> "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse\uv913\uiRQZUR76UdeymR-PGaw6sbCHMCOcaovu",
> "entry": {
> "name": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse",
> "instance": "RQZUR76UdeymR-PGaw6sbCHMCOcaovu",
> "ver": {
> "pool": 23,
> "epoch": 9680
> },
> "locator": "",
> "exists": "true",
> "meta": {
> "category": 1,
> "size": 103410,
> "mtime": "2015-08-07 17:57:32.00Z",
> "etag": "6c67f5e6cb4aa63f4fa26a3b94d19d3a",
> "owner": "aws-cmis-prod",
> "owner_display_name": "AWS-CMIS prod user",
> "content_type": "application\/pdf",
> "accounted_size": 103410
> },
> "tag": "be-east.34319.4520377",
> "flags": 3,
> "pending_map": [],
> "versioned_epoch": 2
> }
> },
> {
> "type": "instance",
> "idx":
> "�1000_abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse\uiRQZUR76UdeymR-PGaw6sbCHMCOcaovu",
> "entry": {
> "name": "abc_econtract\/data\/6smuz2ysavvxbygng34tgusyse",
> "instance": "RQZUR76UdeymR-PGaw6sbCHMCOcaovu&quo

Re: [ceph-users] Troubleshooting rgw bucket list

2015-09-01 Thread Sam Wouters
It looks like it, this is what shows in the logs after bumping the debug
and requesting a bucket list.

2015-09-01 17:14:53.008620 7fccb17ca700 10 cls_bucket_list
aws-cmis-prod(@{i=.be-east.rgw.buckets.index}.be-east.rgw.buckets[be-east.5436.1])
start
abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5[]
num_entries 1
2015-09-01 17:14:53.008629 7fccb17ca700 20 reading from
.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
2015-09-01 17:14:53.008636 7fccb17ca700 20 get_obj_state:
rctx=0x7fccb17c84d0
obj=.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
state=0x7fcde01a4060 s->prefetch_data=0
2015-09-01 17:14:53.008640 7fccb17ca700 10 cache get:
name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
2015-09-01 17:14:53.008645 7fccb17ca700 20 get_obj_state: s->obj_tag was
set empty
2015-09-01 17:14:53.008647 7fccb17ca700 10 cache get:
name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
2015-09-01 17:14:53.008675 7fccb17ca700  1 -- 10.11.4.105:0/1109243 -->
10.11.4.105:6801/39085 -- osd_op(client.55506.0:435874
.dir.be-east.5436.1 [call rgw.bucket_list] 26.7d78fc84
ack+read+known_if_redirected e255) v5 -- ?+0 0x7fcde01a0540 con 0x3a2d870
2015-09-01 17:14:53.009136 7fccb17ca700 10 cls_bucket_list
aws-cmis-prod(@{i=.be-east.rgw.buckets.index}.be-east.rgw.buckets[be-east.5436.1])
start
abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5[]
num_entries 1
2015-09-01 17:14:53.009146 7fccb17ca700 20 reading from
.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
2015-09-01 17:14:53.009153 7fccb17ca700 20 get_obj_state:
rctx=0x7fccb17c84d0
obj=.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
state=0x7fcde01a4060 s->prefetch_data=0
2015-09-01 17:14:53.009158 7fccb17ca700 10 cache get:
name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
2015-09-01 17:14:53.009163 7fccb17ca700 20 get_obj_state: s->obj_tag was
set empty
2015-09-01 17:14:53.009165 7fccb17ca700 10 cache get:
name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
2015-09-01 17:14:53.009189 7fccb17ca700  1 -- 10.11.4.105:0/1109243 -->
10.11.4.105:6801/39085 -- osd_op(client.55506.0:435876
.dir.be-east.5436.1 [call rgw.bucket_list] 26.7d78fc84
ack+read+known_if_redirected e255) v5 -- ?+0 0x7fcde01a0540 con 0x3a2d870
2015-09-01 17:14:53.009629 7fccb17ca700 10 cls_bucket_list
aws-cmis-prod(@{i=.be-east.rgw.buckets.index}.be-east.rgw.buckets[be-east.5436.1])
start
abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5[]
num_entries 1
2015-09-01 17:14:53.009638 7fccb17ca700 20 reading from
.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
2015-09-01 17:14:53.009645 7fccb17ca700 20 get_obj_state:
rctx=0x7fccb17c84d0
obj=.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
state=0x7fcde01a4060 s->prefetch_data=0
2015-09-01 17:14:53.009651 7fccb17ca700 10 cache get:
name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
2015-09-01 17:14:53.009655 7fccb17ca700 20 get_obj_state: s->obj_tag was
set empty
2015-09-01 17:14:53.009657 7fccb17ca700 10 cache get:
name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
2015-09-01 17:14:53.009681 7fccb17ca700  1 -- 10.11.4.105:0/1109243 -->
10.11.4.105:6801/39085 -- osd_op(client.55506.0:435878
.dir.be-east.5436.1 [call rgw.bucket_list] 26.7d78fc84
ack+read+known_if_redirected e255) v5 -- ?+0 0x7fcde01a0540 con 0x3a2d870
2015-09-01 17:14:53.010139 7fccb17ca700 10 cls_bucket_list
aws-cmis-prod(@{i=.be-east.rgw.buckets.index}.be-east.rgw.buckets[be-east.5436.1])
start
abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5[]
num_entries 1
2015-09-01 17:14:53.010149 7fccb17ca700 20 reading from
.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
2015-09-01 17:14:53.010156 7fccb17ca700 20 get_obj_state:
rctx=0x7fccb17c84d0
obj=.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
state=0x7fcde01a4060 s->prefetch_data=0
2015-09-01 17:14:53.010161 7fccb17ca700 10 cache get:
name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
2015-09-01 17:14:53.010166 7fccb17ca700 20 get_obj_state: s->obj_tag was
set empty
2015-09-01 17:14:53.010168 7fccb17ca700 10 cache get:
name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
2015-09-01 17:14:53.010192 7fccb17ca700  1 -- 10.11.4.105:0/1109243 -->
10.11.4.105:6801/39085 -- osd_op(client.55506.0:435880
.dir.be-east.5436.1 [call rgw.bucket_list] 26.7d78fc84
ack+read+known_if_redirected e255) v5 -- ?+0 0x7fcde01a0540 con 0x3a2d870

On 01-09-15 17:11, Yehuda Sadeh-Weinraub wrote:
> Can you bump up debug (debug rgw = 20, debug ms = 1), and see if the
> operations (bucket listing and bucket check) go into some kind of
> infinite loop?
>
> Yehuda
>
> On Tue, Sep 1, 2015 at 1:16 AM, Sam Wouters <s...@ericom.be> wrote:
>> Hi, I've started the bucket --check --fix on friday evening and it's
>

Re: [ceph-users] Troubleshooting rgw bucket list

2015-09-01 Thread Sam Wouters
Hi, I've started the bucket --check --fix on friday evening and it's
still running. 'ceph -s' shows the cluster health as OK, I don't know if
there is anything else I could check? Is there a way of finding out if
its actually doing something?

We only have this issue on the one bucket with versioning enabled, I
can't get rid of the feeling it has something todo with that. The
"underscore bug" is also still present on that bucket
(http://tracker.ceph.com/issues/12819). Not sure if thats related in any
way.
Are there any alternatives, as for example copy all the objects into a
new bucket without versioning? Simple way would be to list the objects
and copy them to a new bucket, but bucket listing is not working so...

-Sam


On 31-08-15 10:47, Gregory Farnum wrote:
> This generally shouldn't be a problem at your bucket sizes. Have you
> checked that the cluster is actually in a healthy state? The sleeping
> locks are normal but should be getting woken up; if they aren't it
> means the object access isn't working for some reason. A down PG or
> something would be the simplest explanation.
> -Greg
>
> On Fri, Aug 28, 2015 at 6:52 PM, Sam Wouters <s...@ericom.be> wrote:
>> Ok, maybe I'm to impatient. It would be great if there were some verbose
>> or progress logging of the radosgw-admin tool.
>> I will start a check and let it run over the weekend.
>>
>> tnx,
>> Sam
>>
>> On 28-08-15 18:16, Sam Wouters wrote:
>>> Hi,
>>>
>>> this bucket only has 13389 objects, so the index size shouldn't be a
>>> problem. Also, on the same cluster we have an other bucket with 1200543
>>> objects (but no versioning configured), which has no issues.
>>>
>>> when we run a radosgw-admin bucket --check (--fix), nothing seems to be
>>> happening. Putting an strace on the process shows a lot of lines like these:
>>> [pid 99372] futex(0x2d730d4, FUTEX_WAIT_PRIVATE, 156619, NULL
>>> 
>>> [pid 99385] futex(0x2da9410, FUTEX_WAIT_PRIVATE, 2, NULL 
>>> [pid 99371] futex(0x2da9410, FUTEX_WAKE_PRIVATE, 1 
>>> [pid 99385] <... futex resumed> )   = -1 EAGAIN (Resource
>>> temporarily unavailable)
>>> [pid 99371] <... futex resumed> )   = 0
>>>
>>> but no errors in the ceph logs or health warnings.
>>>
>>> r,
>>> Sam
>>>
>>> On 28-08-15 17:49, Ben Hines wrote:
>>>> How many objects in the bucket?
>>>>
>>>> RGW has problems with index size once number of objects gets into the
>>>> 90+ level. The buckets need to be recreated with 'sharded bucket
>>>> indexes' on:
>>>>
>>>> rgw override bucket index max shards = 23
>>>>
>>>> You could also try repairing the index with:
>>>>
>>>>  radosgw-admin bucket check --fix --bucket=
>>>>
>>>> -Ben
>>>>
>>>> On Fri, Aug 28, 2015 at 8:38 AM, Sam Wouters <s...@ericom.be> wrote:
>>>>> Hi,
>>>>>
>>>>> we have a rgw bucket (with versioning) where PUT and GET operations for
>>>>> specific objects succeed,  but retrieving an object list fails.
>>>>> Using python-boto, after a timeout just gives us an 500 internal error;
>>>>> radosgw-admin just hangs.
>>>>> Also a radosgw-admin bucket check just seems to hang...
>>>>>
>>>>> ceph version is 0.94.3 but this also was happening with 0.94.2, we
>>>>> quietly hoped upgrading would fix but it didn't...
>>>>>
>>>>> r,
>>>>> Sam
>>>>> ___
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rgw 0.94.3: objects starting with underscore in bucket with versioning enabled are not retrievable

2015-08-28 Thread Sam Wouters

Hi,

we had an issue in our ceph clusters and are able to reproduce this in 
our labo cluster, just upgraded to hammer 0.94.3.


Steps to reproduce:
1) create bucket test2
2) put _test object - lists and retrieves ok
3) enable versioning on test
4) put _test2 object - lists, but get fails with ERROR: 
ErrorNoSuchKey; object _test is still retrievable

5) disable versioning (bucket.configure_versioning(False))
6) put _test3 object - lists ok, retrieves OK (still errorNoSuchKey on 
object _test2)


- Does anyone know if this is a known bug or should I open a tracker?
- we're fine to disable versioning for now, but we should find a way to 
retrieve or _objects uploaded with versioning support enabled. Or be 
able to rename/delete them... Any help or pointers would be much 
appreciated.


Running the fix-tool doesn't show any errors, or doesn't fix anything:
radosgw-admin -n client.radosgw.be-south-1 bucket check 
--check-head-obj-locator --bucket=test2

{
bucket: test2,
check_objects: [
{
key: {
type: head,
name: _test,
instance: 
},
oid: be03-south.7213293.1___test,
locator: be03-south.7213293.1__test,
needs_fixing: false,
status: ok
},
{
key: {
type: tail,
name: _test,
instance: 
},
needs_fixing: false,
status: ok
},
{
key: {
type: head,
name: _test2,
instance: NiKP46KSCHJAVQbnkoGv.RLfuYobP7B
},
oid: 
be03-south.7213293.1__:NiKP46KSCHJAVQbnkoGv.RLfuYobP7B__test2,

locator: be03-south.7213293.1__test2,
needs_fixing: false,
status: ok
},
{
key: {
type: tail,
name: _test2,
instance: NiKP46KSCHJAVQbnkoGv.RLfuYobP7B
},
needs_fixing: false,
status: ok
},
{
key: {
type: head,
name: _test3,
instance: 
},
oid: be03-south.7213293.1___test3,
locator: be03-south.7213293.1__test3,
needs_fixing: false,
status: ok
},
{
key: {
type: tail,
name: _test3,
instance: 
},
needs_fixing: false,
status: ok
}

]
}

regards,
Sam





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Troubleshooting rgw bucket list

2015-08-28 Thread Sam Wouters
Hi,

we have a rgw bucket (with versioning) where PUT and GET operations for
specific objects succeed,  but retrieving an object list fails.
Using python-boto, after a timeout just gives us an 500 internal error;
radosgw-admin just hangs.
Also a radosgw-admin bucket check just seems to hang...

ceph version is 0.94.3 but this also was happening with 0.94.2, we
quietly hoped upgrading would fix but it didn't...

r,
Sam
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubleshooting rgw bucket list

2015-08-28 Thread Sam Wouters
Hi,

this bucket only has 13389 objects, so the index size shouldn't be a
problem. Also, on the same cluster we have an other bucket with 1200543
objects (but no versioning configured), which has no issues.

when we run a radosgw-admin bucket --check (--fix), nothing seems to be
happening. Putting an strace on the process shows a lot of lines like these:
[pid 99372] futex(0x2d730d4, FUTEX_WAIT_PRIVATE, 156619, NULL
unfinished ...
[pid 99385] futex(0x2da9410, FUTEX_WAIT_PRIVATE, 2, NULL unfinished ...
[pid 99371] futex(0x2da9410, FUTEX_WAKE_PRIVATE, 1 unfinished ...
[pid 99385] ... futex resumed )   = -1 EAGAIN (Resource
temporarily unavailable)
[pid 99371] ... futex resumed )   = 0

but no errors in the ceph logs or health warnings.

r,
Sam

On 28-08-15 17:49, Ben Hines wrote:
 How many objects in the bucket?

 RGW has problems with index size once number of objects gets into the
 90+ level. The buckets need to be recreated with 'sharded bucket
 indexes' on:

 rgw override bucket index max shards = 23

 You could also try repairing the index with:

  radosgw-admin bucket check --fix --bucket=bucketname

 -Ben

 On Fri, Aug 28, 2015 at 8:38 AM, Sam Wouters s...@ericom.be wrote:
 Hi,

 we have a rgw bucket (with versioning) where PUT and GET operations for
 specific objects succeed,  but retrieving an object list fails.
 Using python-boto, after a timeout just gives us an 500 internal error;
 radosgw-admin just hangs.
 Also a radosgw-admin bucket check just seems to hang...

 ceph version is 0.94.3 but this also was happening with 0.94.2, we
 quietly hoped upgrading would fix but it didn't...

 r,
 Sam
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubleshooting rgw bucket list

2015-08-28 Thread Sam Wouters
Ok, maybe I'm to impatient. It would be great if there were some verbose
or progress logging of the radosgw-admin tool.
I will start a check and let it run over the weekend.

tnx,
Sam

On 28-08-15 18:16, Sam Wouters wrote:
 Hi,

 this bucket only has 13389 objects, so the index size shouldn't be a
 problem. Also, on the same cluster we have an other bucket with 1200543
 objects (but no versioning configured), which has no issues.

 when we run a radosgw-admin bucket --check (--fix), nothing seems to be
 happening. Putting an strace on the process shows a lot of lines like these:
 [pid 99372] futex(0x2d730d4, FUTEX_WAIT_PRIVATE, 156619, NULL
 unfinished ...
 [pid 99385] futex(0x2da9410, FUTEX_WAIT_PRIVATE, 2, NULL unfinished ...
 [pid 99371] futex(0x2da9410, FUTEX_WAKE_PRIVATE, 1 unfinished ...
 [pid 99385] ... futex resumed )   = -1 EAGAIN (Resource
 temporarily unavailable)
 [pid 99371] ... futex resumed )   = 0

 but no errors in the ceph logs or health warnings.

 r,
 Sam

 On 28-08-15 17:49, Ben Hines wrote:
 How many objects in the bucket?

 RGW has problems with index size once number of objects gets into the
 90+ level. The buckets need to be recreated with 'sharded bucket
 indexes' on:

 rgw override bucket index max shards = 23

 You could also try repairing the index with:

  radosgw-admin bucket check --fix --bucket=bucketname

 -Ben

 On Fri, Aug 28, 2015 at 8:38 AM, Sam Wouters s...@ericom.be wrote:
 Hi,

 we have a rgw bucket (with versioning) where PUT and GET operations for
 specific objects succeed,  but retrieving an object list fails.
 Using python-boto, after a timeout just gives us an 500 internal error;
 radosgw-admin just hangs.
 Also a radosgw-admin bucket check just seems to hang...

 ceph version is 0.94.3 but this also was happening with 0.94.2, we
 quietly hoped upgrading would fix but it didn't...

 r,
 Sam
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw hanging - blocking rgw.bucket_list ops

2015-08-21 Thread Sam Wouters
tried removing, but no luck:

rados -p .be-east.rgw.buckets rm
be-east.5436.1__:2bpm.1OR-cqyOLUHek8m2RdPVRZ.pDT__sanity
error removing
.be-east.rgw.bucketsbe-east.5436.1__:2bpm.1OR-cqyOLUHek8m2RdPVRZ.pDT__sanity:
(2)

anyone?

On 21-08-15 13:06, Sam Wouters wrote:
 I suspect these to be the cause:

 rados ls -p .be-east.rgw.buckets | grep
 sanitybe-east.5436.1__:2bpm.1OR-cqyOLUHek8m2RdPVRZ.pDT__sanity   
 be-east.5436.1__sanity
 be-east.5436.1__:2vBijaGnVQF4Q0IjZPeyZSKeUmBGn9X__sanity   
 be-east.5436.1__sanity
 be-east.5436.1__:4JTCVFxB1qoDWPu1nhuMDuZ3QNPaq5n__sanity   
 be-east.5436.1__sanity
 be-east.5436.1__:9jFwd8xvqJMdrqZuM8Au4mi9M62ikyo__sanity   
 be-east.5436.1__sanity
 be-east.5436.1__:BlfbGYGvLi92QPSiabT2mP7OeuETz0P__sanity   
 be-east.5436.1__sanity
 be-east.5436.1__:MigpcpJKkan7Po6vBsQsSD.hEIRWuim__sanity   
 be-east.5436.1__sanity
 be-east.5436.1__:QDTxD5p0AmVlPW4v8OPU3vtDLzenj4y__sanity   
 be-east.5436.1__sanity
 be-east.5436.1__:S43EiNAk5hOkzgfbOynbOZOuLtUv0SB__sanity   
 be-east.5436.1__sanity
 be-east.5436.1__:UKlOVMQBQnlK20BHJPyvnG6m.2ogBRW__sanity   
 be-east.5436.1__sanity
 be-east.5436.1__:kkb6muzJgREie6XftdEJdFHxR2MaFeB__sanity   
 be-east.5436.1__sanity
 be-east.5436.1__:oqPhWzFDSQ-sNPtppsl1tPjoryaHNZY__sanity   
 be-east.5436.1__sanity
 be-east.5436.1__:pLhygPGKf3uw7C7OxSJNCw8rQEMOw5l__sanity   
 be-east.5436.1__sanity
 be-east.5436.1__:tO1Nf3S2WOfmcnKVPv0tMeXbwa5JR36__sanity   
 be-east.5436.1__sanity
 be-east.5436.1__:ye4oRwDDh1cGckbMbIo56nQvM7OEyPM__sanity   
 be-east.5436.1__sanity
 be-east.5436.1___sanitybe-east.5436.1__sanity

 would it be save and/or help to remove those with rados rm, and try an
 bucket check --fix --check-objects?

 On 21-08-15 11:28, Sam Wouters wrote:
 Hi,

 We are running hammer 0.94.2 and have an increasing amount of
 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f38c77e6700' had
 timed out after 600 messages in our radosgw logs, with radosgw
 eventually stalling. A restart of the radosgw helps for a few minutes,
 but after that it hangs again.

 ceph daemon /var/run/ceph/ceph-client.*.asok objecter_requests shows
 call rgw.bucket_list ops. No new bucket lists are requested, so those
 ops seem to stay there. Anyone any idea how to get rid of those. Restart
 of the affecting osd didn't help neither.

 I'm not sure if its related, but we have an object called _sanity in
 the bucket where the listing was performed on. I know there is some bug
 with objects starting with _.

 Any help would be much appreciated.

 r,
 Sam
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw hanging - blocking rgw.bucket_list ops

2015-08-21 Thread Sam Wouters
I suspect these to be the cause:

rados ls -p .be-east.rgw.buckets | grep
sanitybe-east.5436.1__:2bpm.1OR-cqyOLUHek8m2RdPVRZ.pDT__sanity   
be-east.5436.1__sanity
be-east.5436.1__:2vBijaGnVQF4Q0IjZPeyZSKeUmBGn9X__sanity   
be-east.5436.1__sanity
be-east.5436.1__:4JTCVFxB1qoDWPu1nhuMDuZ3QNPaq5n__sanity   
be-east.5436.1__sanity
be-east.5436.1__:9jFwd8xvqJMdrqZuM8Au4mi9M62ikyo__sanity   
be-east.5436.1__sanity
be-east.5436.1__:BlfbGYGvLi92QPSiabT2mP7OeuETz0P__sanity   
be-east.5436.1__sanity
be-east.5436.1__:MigpcpJKkan7Po6vBsQsSD.hEIRWuim__sanity   
be-east.5436.1__sanity
be-east.5436.1__:QDTxD5p0AmVlPW4v8OPU3vtDLzenj4y__sanity   
be-east.5436.1__sanity
be-east.5436.1__:S43EiNAk5hOkzgfbOynbOZOuLtUv0SB__sanity   
be-east.5436.1__sanity
be-east.5436.1__:UKlOVMQBQnlK20BHJPyvnG6m.2ogBRW__sanity   
be-east.5436.1__sanity
be-east.5436.1__:kkb6muzJgREie6XftdEJdFHxR2MaFeB__sanity   
be-east.5436.1__sanity
be-east.5436.1__:oqPhWzFDSQ-sNPtppsl1tPjoryaHNZY__sanity   
be-east.5436.1__sanity
be-east.5436.1__:pLhygPGKf3uw7C7OxSJNCw8rQEMOw5l__sanity   
be-east.5436.1__sanity
be-east.5436.1__:tO1Nf3S2WOfmcnKVPv0tMeXbwa5JR36__sanity   
be-east.5436.1__sanity
be-east.5436.1__:ye4oRwDDh1cGckbMbIo56nQvM7OEyPM__sanity   
be-east.5436.1__sanity
be-east.5436.1___sanitybe-east.5436.1__sanity

would it be save and/or help to remove those with rados rm, and try an
bucket check --fix --check-objects?

On 21-08-15 11:28, Sam Wouters wrote:
 Hi,

 We are running hammer 0.94.2 and have an increasing amount of
 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f38c77e6700' had
 timed out after 600 messages in our radosgw logs, with radosgw
 eventually stalling. A restart of the radosgw helps for a few minutes,
 but after that it hangs again.

 ceph daemon /var/run/ceph/ceph-client.*.asok objecter_requests shows
 call rgw.bucket_list ops. No new bucket lists are requested, so those
 ops seem to stay there. Anyone any idea how to get rid of those. Restart
 of the affecting osd didn't help neither.

 I'm not sure if its related, but we have an object called _sanity in
 the bucket where the listing was performed on. I know there is some bug
 with objects starting with _.

 Any help would be much appreciated.

 r,
 Sam
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw hanging - blocking rgw.bucket_list ops

2015-08-21 Thread Sam Wouters
Hi,

We are running hammer 0.94.2 and have an increasing amount of
heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f38c77e6700' had
timed out after 600 messages in our radosgw logs, with radosgw
eventually stalling. A restart of the radosgw helps for a few minutes,
but after that it hangs again.

ceph daemon /var/run/ceph/ceph-client.*.asok objecter_requests shows
call rgw.bucket_list ops. No new bucket lists are requested, so those
ops seem to stay there. Anyone any idea how to get rid of those. Restart
of the affecting osd didn't help neither.

I'm not sure if its related, but we have an object called _sanity in
the bucket where the listing was performed on. I know there is some bug
with objects starting with _.

Any help would be much appreciated.

r,
Sam
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw-agent keeps syncing most active bucket - ignoring others

2015-08-18 Thread Sam Wouters
Hi,

from the doc of radosgw-agent and some items in this list, I understood
that the max-entries argument was there to prevent a very active bucket
to keep the other buckets from keeping synced. In our agent logs however
we saw a lot of bucket instance bla has 1000 entries after bla, and
the agent kept on syncing that active bucket.

Looking at the code, in class DataWorkerIncremental, it looks like the
agent loops in fetching log entries from the bucket until it receives
less entries then the max_entries. Is this intended behaviour? I would
suspect it to just pass the max_entries log entries for processing and
increase the marker.

Is there any other way to make sure less active buckets are frequently
synced? We've tried increasing num-workers, but this only has affect the
first pass.

Thanks,
Sam
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw-agent keeps syncing most active bucket - ignoring others

2015-08-18 Thread Sam Wouters
Hmm,

looks like intended behaviour:

SNIP
CommitDate: Mon Mar 3 06:08:42 2014 -0800

   worker: process all bucket instance log entries at once

 Currently if there are more than max_entries in a single bucket
   instance's log, only max_entries of those will be processed, and the
   bucket instance will not be examined again until it is modified again.

   To keep it simple, get the entire log of entries to be updated and
   process them all at once. This means one busy shard may block others
   from syncing, but multiple instances of radosgw-agent can be run to
   circumvent that issue. With only one instance, users can be sure
   everything is synced when an incremental sync completes with no
   errors.
/SNIP

However, this brings us to a new issue. After starting a second agent,
one of the agents is busy syncing the busy shard and the other agent
synced correctly all of the other buckets. So far, so good. But, since a
few of them are almost static, it looks like it started syncing those in
a second run from the beginning all over again.
As versioning was enabled on those buckets after they were created and
with already objects and removed objects in there, it seems like the
agent is copying those unversioned objects to versioned ones, creating a
lot of delete markers and multiple versions in the secondary zone.

Anyone any idea how to handle this correctly. I've already did a cleanup
some weeks ago, but if the agent is going to keep on restarting the sync
from the beginning, I'll have to cleanup every time.

regards,
Sam

On 18-08-15 09:36, Sam Wouters wrote:
 Hi,

 from the doc of radosgw-agent and some items in this list, I understood
 that the max-entries argument was there to prevent a very active bucket
 to keep the other buckets from keeping synced. In our agent logs however
 we saw a lot of bucket instance bla has 1000 entries after bla, and
 the agent kept on syncing that active bucket.

 Looking at the code, in class DataWorkerIncremental, it looks like the
 agent loops in fetching log entries from the bucket until it receives
 less entries then the max_entries. Is this intended behaviour? I would
 suspect it to just pass the max_entries log entries for processing and
 increase the marker.

 Is there any other way to make sure less active buckets are frequently
 synced? We've tried increasing num-workers, but this only has affect the
 first pass.

 Thanks,
 Sam
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Radosgw-agent with version enabled bucket - duplicate objects

2015-07-03 Thread Sam Wouters

Hi,

I've upgraded to from hammer 0.94.1 to 0.94.2 and 
radosgw-agent-1.2.2-0.el7.centos.noarch from 1.2.1 and after restart of 
the agent (with versioned set to true), I noticed duplicate objects in a 
version-ed enabled bucket on the replication site.


For example:
on source side:
object: Key: metadatab/e58438be260f48dd8d7b7855
version_id: null
(old object before versioning was enabled on the bucket)

on replication side:
object: Key: metadatab/e58438be260f48dd8d7b7855
version_id 1: rZ1f4LtbeDSx6O8Nsz.m28MNamraPFd
version_id 2: null

When restarting the agent without the --versioned param, it seems like 
it does a full sync again, and now I'm getting three objects for every 
source object.


I have no idea how to get the zones back into sync (without duplicate 
objects) and how to prevent this from happening again, so any help would 
be much appreciated.


regards,
Sam


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Firefly - Giant : CentOS 7 : install failed ceph-deploy

2015-04-08 Thread Sam Wouters
]# cat ceph.repo
 [Ceph]
 name=Ceph packages for $basearch
 baseurl=http://ceph.com/rpm-giant/el7/$basearch
 enabled=1
 gpgcheck=1
 type=rpm-md

gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 priority=1

 [Ceph-noarch]
 name=Ceph noarch packages
 baseurl=http://ceph.com/rpm-giant/el7/noarch
 enabled=1
 gpgcheck=1
 type=rpm-md

gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 priority=1

 [ceph-source]
 name=Ceph source packages
 baseurl=http://ceph.com/rpm-giant/el7/SRPMS
 enabled=1
 gpgcheck=1
 type=rpm-md

gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 priority=1


 When i visit this directory http://ceph.com/rpm-giant/el7 , i
can see
 multiple versions of python-ceph i.e.
 python-ceph-0.86-0.el7.centos.x86_64
 python-ceph-0.87-0.el7.centos.x86_64
 python-ceph-0.87-1.el7.centos.x86_64

 *This is the reason , yum is getting confused to install the latest
 available version python-ceph-0.87-1.el7.centos.x86_64. This issue looks
 like yum priority plugin and RPM obsolete.*

 http://tracker.ceph.com/issues/10476

 [root@rgw-node1 yum.repos.d]# cat
/etc/yum/pluginconf.d/priorities.conf
 [main]
 enabled = 1
 check_obsoletes = 1

 [root@rgw-node1 yum.repos.d]#

 [root@rgw-node1 yum.repos.d]#
 [root@rgw-node1 yum.repos.d]# uname -r
 3.10.0-229.1.2.el7.x86_64
 [root@rgw-node1 yum.repos.d]# cat /etc/redhat-release
 CentOS Linux release 7.1.1503 (Core)
 [root@rgw-node1 yum.repos.d]#


 However it worked *fine 1 week back* on CentOS 7.0

 [root@ceph-node1 ceph]# uname -r
 3.10.0-123.20.1.el7.x86_64
 [root@ceph-node1 ceph]# cat /etc/redhat-release
 CentOS Linux release 7.0.1406 (Core)
 [root@ceph-node1 ceph]#


 Any fix to this is highly appreciated.

 Regards
 VS



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Sam Wouters
Ericom Computers
--
*Ericom Computers*
Tiensestraat 178
3000 Leuven

Tel : +32 (0) 16 23 77 55
Fax : +32 (0) 16 23 48 05
Ericom Website http://www.ericom.be
*
http://www.ericom.be*




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com