[ceph-users] How to fix mon scrub errors?

2017-12-12 Thread Burkhard Linke

HI,


since the upgrade to luminous 12.2.2 the mons are complaining about 
scrub errors:



2017-12-13 08:49:27.169184 mon.ceph-storage-03 [ERR] scrub mismatch
2017-12-13 08:49:27.169203 mon.ceph-storage-03 [ERR]  mon.0 
ScrubResult(keys {logm=87,mds_health=13} crc 
{logm=4080463437,mds_health=2210310418})
2017-12-13 08:49:27.169216 mon.ceph-storage-03 [ERR]  mon.1 
ScrubResult(keys {logm=87,mds_health=13} crc 
{logm=4080463437,mds_health=1599893324})

2017-12-13 08:49:27.169229 mon.ceph-storage-03 [ERR] scrub mismatch
2017-12-13 08:49:27.169243 mon.ceph-storage-03 [ERR]  mon.0 
ScrubResult(keys {logm=87,mds_health=13} crc 
{logm=4080463437,mds_health=2210310418})
2017-12-13 08:49:27.169260 mon.ceph-storage-03 [ERR]  mon.2 
ScrubResult(keys {logm=87,mds_health=13} crc 
{logm=4080463437,mds_health=3057347215})

2017-12-13 08:49:27.176435 mon.ceph-storage-03 [ERR] scrub mismatch
2017-12-13 08:49:27.176454 mon.ceph-storage-03 [ERR]  mon.0 
ScrubResult(keys {mgrstat=10,monmap=26,osd_metadata=64} crc 
{mgrstat=3940483607,monmap=3662510285,osd_metadata=45209833})
2017-12-13 08:49:27.176472 mon.ceph-storage-03 [ERR]  mon.1 
ScrubResult(keys {mgrstat=10,monmap=26,osd_metadata=64} crc 
{mgrstat=3940483607,monmap=3662510285,osd_metadata=289852700})



These errors might have been caused by problems setting up multi mds 
after luminous upgrade.


OSD scrub errors are a well known problem with many available 
solutionsbut how do I fix mon scrub errors?


Best regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Health Error : Request Stuck

2017-12-12 Thread Karun Josy
Cluster is unusable because of inactive PGs. How can we correct it?

=
ceph pg dump_stuck inactive
ok
PG_STAT STATE   UP   UP_PRIMARY ACTING
 ACTING_PRIMARY
1.4bactivating+remapped [5,2,0,13,1]  5 [5,2,13,1,4]
  5
1.35activating+remapped [2,7,0,1,12]  2 [2,7,1,12,9]
  2
1.12activating+remapped  [1,3,5,0,7]  1  [1,3,5,7,2]
  1
1.4eactivating+remapped  [1,3,0,9,2]  1  [1,3,0,9,5]
  1
2.3bactivating+remapped [13,1,0] 13 [13,1,2]
 13
1.19activating+remapped [2,13,8,9,0]  2 [2,13,8,9,1]
  2
1.1eactivating+remapped [2,3,1,10,0]  2 [2,3,1,10,5]
  2
2.29activating+remapped [1,0,13]  1 [1,8,11]
  1
1.6factivating+remapped [8,2,0,4,13]  8 [8,2,4,13,1]
  8
1.74activating+remapped [7,13,2,0,4]  7 [7,13,2,4,1]
  7


Karun Josy

On Wed, Dec 13, 2017 at 8:27 AM, Karun Josy  wrote:

> Hello,
>
> We added a new disk to the cluster and while rebalancing we are getting
> error warnings.
>
> =
> Overall status: HEALTH_ERR
> REQUEST_SLOW: 1824 slow requests are blocked > 32 sec
> REQUEST_STUCK: 1022 stuck requests are blocked > 4096 sec
> ==
>
> The load in the servers seems to be very low.
>
> How can I correct it?
>
>
> Karun
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which version of ceph is better for cephfs in production

2017-12-12 Thread Yan, Zheng
On Wed, Dec 13, 2017 at 9:27 AM, 13605702...@163.com
<13605702...@163.com> wrote:
> hi
>
> since Jewel, cephfs is considered as production ready.
> but can anybody tell me which version fo ceph is better? Jewel? kraken? or
> Luminous?
>

luminous, version 12.2.2

> thanks
>
> 
> 13605702...@163.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] inconsistent pg issue with ceph version 10.2.3

2017-12-12 Thread Thanh Tran
I fixed this inconsistent error. It seems ceph didn't delete the mismatch
object that depends the deleted snapshot. This caused error "unexpected
clone" that resulted of message inconsistent.

Log:
2017-12-12 20:14:06.651942 7fc7eff7e700 -1 log_channel(cluster) log [ERR] :
deep-scrub 4.1b42
4:42db3cde:::rbd_data.aa5af9238e1f29.470a:2ccac is an
unexpected clone

Steps to fix:
1. Move the mismatch object on both of osd.159 and osd.179 to other location
2. ceph pg deep-scrub 4.1b42

On Tue, Dec 12, 2017 at 9:34 PM, Thanh Tran  wrote:

> Hi,
>
> My ceph cluster has a inconsistent pg. I tried to deep scrub pg and repair
> pg but not fix problems.
> I found that the object that made pg inconsistent depends on snapshot
> (snap id is 2ccac = 183468) of a image, I deleted this snapshot, then query
> inconsistent pg and it showed empty, but my ceph cluster still show error
> inconsistent pg. It's strange that the mismatch object depends on the
> deleted snapshot still exist on disk.
>
> After deleting the snapshot has the mismatch object, I re-tried to deep
> scrub and repair pg but it doesn't help to fix.
>
> Information of commands as below. I don't know how to clear this status of
> inconsistent. Please help me to fix this bug.
>
> *ceph --cluster QTC01 health detail*
> HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
> pg 4.1b42 is active+clean+inconsistent, acting [159,179]
> 1 scrub errors
>
> *rados list-inconsistent-obj 4.1b42 --format=json-pretty --cluster QTC01*
> {
> "epoch": 494830,
> "inconsistents": []
> }
>
> *ll /var/lib/ceph/osd/QTC01-159/current/4.1b42_head/DIR_2/DIR_4/DIR_B/ |
> grep aa5af9238e1f29*
> -rw-r--r-- 1 ceph ceph 4194304 Dec 12 15:29 rbd\udata.aa5af9238e1f29.
> 470a__2ccac_7B3CDB42__4
> -rw-r--r-- 1 ceph ceph 1048576 Dec 12 15:29 rbd\udata.aa5af9238e1f29.
> 470a__head_7B3CDB42__4
>
> *ll /var/lib/ceph/osd/QTC01-179/current/4.1b42_head/DIR_2/DIR_4/DIR_B/ |
> grep aa5af9238e1f29*
> -rw-r--r-- 1 ceph ceph 4194304 Dec 12 15:17 rbd\udata.aa5af9238e1f29.
> 470a__2ccac_7B3CDB42__4
> -rw-r--r-- 1 ceph ceph 1048576 Dec 12 15:17 rbd\udata.aa5af9238e1f29.
> 470a__head_7B3CDB42__4
>
> Best regards,
> Thanh Tran
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Health Error : Request Stuck

2017-12-12 Thread Karun Josy
Hello,

We added a new disk to the cluster and while rebalancing we are getting
error warnings.

=
Overall status: HEALTH_ERR
REQUEST_SLOW: 1824 slow requests are blocked > 32 sec
REQUEST_STUCK: 1022 stuck requests are blocked > 4096 sec
==

The load in the servers seems to be very low.

How can I correct it?


Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] which version of ceph is better for cephfs in production

2017-12-12 Thread 13605702...@163.com
hi

since Jewel, cephfs is considered as production ready.
but can anybody tell me which version fo ceph is better? Jewel? kraken? or 
Luminous?

thanks



13605702...@163.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Sudden omap growth on some OSDs

2017-12-12 Thread Gregory Farnum
On Tue, Dec 12, 2017 at 3:36 PM  wrote:

> From: Gregory Farnum 
> Date: Tuesday, 12 December 2017 at 19:24
> To: "Vasilakakos, George (STFC,RAL,SC)" 
> Cc: "ceph-users@lists.ceph.com" 
> Subject: Re: [ceph-users] Sudden omap growth on some OSDs
>
> On Tue, Dec 12, 2017 at 3:16 AM  george.vasilaka...@stfc.ac.uk>> wrote:
>
> On 11 Dec 2017, at 18:24, Gregory Farnum  gfar...@redhat.com>>>
> wrote:
>
> Hmm, this does all sound odd. Have you tried just restarting the primary
> OSD yet? That frequently resolves transient oddities like this.
> If not, I'll go poke at the kraken source and one of the developers more
> familiar with the recovery processes we're seeing here.
> -Greg
>
>
> Hi Greg,
>
> I’ve tried this, no effect. Also, on Friday, we tried removing an OSD (not
> the primary), the OSD that was chosen to replace it had it’s LevelDB grow
> to 7GiB by now. Yesterday it was 5.3.
> We’re not seeing any errors logged by the OSDs with the default logging
> level either.
>
> Do you have any comments on the fact that the primary sees the PG’s state
> as being different to what the peers think?
>
> Yes. It's super weird. :p
>
> Now, with a new primary I’m seeing the last peer in the set reporting it’s
> ‘active+clean’, as is the primary, all other are saying it’s
> ‘active+clean+degraded’ (according to PG query output).
>
> Has the last OSD in the list shrunk down its LevelDB instance?
>
> No, the last peer has the largest one currently part of the PG at 14GiB.
>
> If so (or even if not), I'd try restarting all the OSDs in the PG and see
> if that changes things.
>
> Will try that and report back.
>
> If it doesn't...well, it's about to be Christmas and Luminous saw quite a
> bit of change in this space, so it's unlikely to get a lot of attention. :/
>
> Yeah, this being Kraken I doubt it will get looked into deeply.
>
> But the next step would be to gather high-level debug logs from the OSDs
> in question, especially as a peering action takes place.
>
> I’ll be re-introducing the old primary this week so maybe I’ll bump the
> logging levels (to what?) on these OSDs and see what they come up with.
>

debug osd = 20


>
> Oh!
> I didn't notice you previously mentioned "custom gateways using the
> libradosstriper". Are those backing onto this pool? What operations are
> they doing?
> Something like repeated overwrites of EC data could definitely have
> symptoms similar to this (apart from the odd peering bit.)
> -Greg
>
> Think of these as using the cluster as an object store. Most of the time
> we’re writing something in, reading it out anywhere from zero to thousands
> of times (each time running stat as well) and eventually may be deleting
> it. Once written, there’s no reason to be overwritten. They’re backing onto
> the EC pools (one per “tenant”) but the particular pool that this PG is a
> part of has barely seen any use. The most used one is storing petabytes and
> this one was barely reaching 100TiB when this came up.
>

Yeah, it would be about overwrites specifically, not just using the data.
Congratulations, you've exceeded the range of even my WAGs. :/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Sudden omap growth on some OSDs

2017-12-12 Thread george.vasilakakos
From: Gregory Farnum 
Date: Tuesday, 12 December 2017 at 19:24
To: "Vasilakakos, George (STFC,RAL,SC)" 
Cc: "ceph-users@lists.ceph.com" 
Subject: Re: [ceph-users] Sudden omap growth on some OSDs

On Tue, Dec 12, 2017 at 3:16 AM 
mailto:george.vasilaka...@stfc.ac.uk>> wrote:

On 11 Dec 2017, at 18:24, Gregory Farnum 
mailto:gfar...@redhat.com>>>
 wrote:

Hmm, this does all sound odd. Have you tried just restarting the primary OSD 
yet? That frequently resolves transient oddities like this.
If not, I'll go poke at the kraken source and one of the developers more 
familiar with the recovery processes we're seeing here.
-Greg


Hi Greg,

I’ve tried this, no effect. Also, on Friday, we tried removing an OSD (not the 
primary), the OSD that was chosen to replace it had it’s LevelDB grow to 7GiB 
by now. Yesterday it was 5.3.
We’re not seeing any errors logged by the OSDs with the default logging level 
either.

Do you have any comments on the fact that the primary sees the PG’s state as 
being different to what the peers think?

Yes. It's super weird. :p

Now, with a new primary I’m seeing the last peer in the set reporting it’s 
‘active+clean’, as is the primary, all other are saying it’s 
‘active+clean+degraded’ (according to PG query output).

Has the last OSD in the list shrunk down its LevelDB instance?

No, the last peer has the largest one currently part of the PG at 14GiB.

If so (or even if not), I'd try restarting all the OSDs in the PG and see if 
that changes things.

Will try that and report back.

If it doesn't...well, it's about to be Christmas and Luminous saw quite a bit 
of change in this space, so it's unlikely to get a lot of attention. :/

Yeah, this being Kraken I doubt it will get looked into deeply.

But the next step would be to gather high-level debug logs from the OSDs in 
question, especially as a peering action takes place.

I’ll be re-introducing the old primary this week so maybe I’ll bump the logging 
levels (to what?) on these OSDs and see what they come up with.

Oh!
I didn't notice you previously mentioned "custom gateways using the 
libradosstriper". Are those backing onto this pool? What operations are they 
doing?
Something like repeated overwrites of EC data could definitely have symptoms 
similar to this (apart from the odd peering bit.)
-Greg

Think of these as using the cluster as an object store. Most of the time we’re 
writing something in, reading it out anywhere from zero to thousands of times 
(each time running stat as well) and eventually may be deleting it. Once 
written, there’s no reason to be overwritten. They’re backing onto the EC pools 
(one per “tenant”) but the particular pool that this PG is a part of has barely 
seen any use. The most used one is storing petabytes and this one was barely 
reaching 100TiB when this came up.



This problem is quite weird I think. I copied a LevelDB and dumped a key list; 
the largest in GiB had 66% the number of keys that the average LevelDB has. The 
main difference with the ones that have been around for a while is that they 
have a lot more files that were last touched on the days when the problem 
started but most other LevelDBs have compacted those away and only have about 7 
days old files (as opposed to 3 week old ones that the big ones keep around). 
The big ones do seem to do compactions, they just don’t seem to get rid of that 
stuff.



Thanks,

George


On Fri, Dec 8, 2017 at 7:30 AM 
mailto:george.vasilaka...@stfc.ac.uk>>>
 wrote:


From: Gregory Farnum 
[gfar...@redhat.com>]
Sent: 07 December 2017 21:57
To: Vasilakakos, George (STFC,RAL,SC)
Cc: 
drakonst...@gmail.com>;
 
ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] Sudden omap growth on some OSDs



On Thu, Dec 7, 2017 at 4:41 AM 
mailto:george.vasilaka...@stfc.ac.uk>>>>>]
Sent: 06 December 2017 22:50
To: David Turner
Cc: Vasilakakos, George (STFC,RAL,SC); 
ceph-users@lists.ceph.com>

Re: [ceph-users] Fwd: Lock doesn't want to be given up

2017-12-12 Thread Florian Margaine
Hi,

As a follow-up, this PR for librbd seems to be what needs to be applied
to krbd too. As said in the PR, the bug is very much reproducible after
Jason Dillaman's suggestion.

Regards,
Florian

Florian Margaine  writes:

> Hi,
>
> We're hitting an odd issue on our ceph cluster:
>
> - We have machine1 mapping an exclusive-lock RBD.
> - Machine2 wants to take a snapshot of the RBD, but fails to take the lock.
>
> Stracing the rbd snap process on machine2 shows it looping on sending
> "lockget" commands, without ever moving forward.
>
> In rbd status, we see that machine1 is a watcher on the image, which is
> expected. What is not expected is that the rbd snap process can't get the
> lock.
>
> This commit deployed in 10.2.10, which we are using, sounds related:
> https://github.com/ceph/ceph/commit/475dda114a7e25b43dc9066b9808a64fc0c6dc89
>
> But there isn't the equivalent in ceph-client's code, which we would expect
> too. That said, I don't have a full understanding, so I might be off-base
> there.
>
> Am I wrong in expecting the equivalent in ceph-client's code? (aka Linux
> kernel) Am I completely off-base as to what is wrong there? Can I provide
> any additional information to help debugging?
>
> Regards,
> Florian


signature.asc
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Odd object blocking IO on PG

2017-12-12 Thread Gregory Farnum
On Tue, Dec 12, 2017 at 12:33 PM Nick Fisk  wrote:

>
> > That doesn't look like an RBD object -- any idea who is
> > "client.34720596.1:212637720"?
>
> So I think these might be proxy ops from the cache tier, as there are also
> block ops on one of the cache tier OSD's, but this time it actually lists
> the object name. Block op on cache tier.
>
>"description": "osd_op(client.34720596.1:212637720 17.ae78c1cf
> 17:f3831e75:::rbd_data.15a5e20238e1f29.000388ad:head
> [set-alloc-hint
> object_size 4194304 write_size 4194304,write 2584576~16384] snapc 0=[]
> RETRY=2 ondisk+retry+write+known_if_redirected e104841)",
> "initiated_at": "2017-12-12 16:25:32.435718",
> "age": 13996.681147,
> "duration": 13996.681203,
> "type_data": {
> "flag_point": "reached pg",
> "client_info": {
> "client": "client.34720596",
> "client_addr": "10.3.31.41:0/2600619462",
> "tid": 212637720
>
> I'm a bit baffled at the moment what's going. The pg query (attached) is
> not
> showing in the main status that it has been blocked from peering or that
> there are any missing objects. I've tried restarting all OSD's I can see
> relating to the PG in case they needed a bit of a nudge.
>

Did that fix anything? I don't see anything immediately obvious but I'm not
practiced in quickly reading that pg state output.

What's the output of "ceph -s"?


>
> >
> > On Tue, Dec 12, 2017 at 12:36 PM, Nick Fisk  wrote:
> > > Does anyone know what this object (0.ae78c1cf) might be, it's not your
> > > normal run of the mill RBD object and I can't seem to find it in the
> > > pool using rados --all ls . It seems to be leaving the 0.1cf PG stuck
> > > in an
> > > activating+remapped state and blocking IO. Pool 0 is just a pure RBD
> > > activating+pool
> > > with a cache tier above it. There is no current mention of unfound
> > > objects or any other obvious issues.
> > >
> > > There is some backfilling going on, on another OSD which was upgraded
> > > to bluestore, which was when the issue started. But I can't see any
> > > link in the PG dump with upgraded OSD. My only thought so far is to
> > > wait for this backfilling to finish and then deep-scrub this PG and
> > > see if that reveals anything?
> > >
> > > Thanks,
> > > Nick
> > >
> > >  "description": "osd_op(client.34720596.1:212637720 0.1cf 0.ae78c1cf
> > > (undecoded)
> > > ondisk+retry+write+ignore_cache+ignore_overlay+known_if_redirected
> > > e105014)",
> > > "initiated_at": "2017-12-12 17:10:50.030660",
> > > "age": 335.948290,
> > > "duration": 335.948383,
> > > "type_data": {
> > > "flag_point": "delayed",
> > > "events": [
> > > {
> > > "time": "2017-12-12 17:10:50.030660",
> > > "event": "initiated"
> > > },
> > > {
> > > "time": "2017-12-12 17:10:50.030692",
> > > "event": "queued_for_pg"
> > > },
> > > {
> > > "time": "2017-12-12 17:10:50.030719",
> > > "event": "reached_pg"
> > > },
> > > {
> > > "time": "2017-12-12 17:10:50.030727",
> > > "event": "waiting for peered"
> > > },
> > > {
> > > "time": "2017-12-12 17:10:50.197353",
> > > "event": "reached_pg"
> > > },
> > > {
> > > "time": "2017-12-12 17:10:50.197355",
> > > "event": "waiting for peered"
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> > --
> > Jason
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore Compression not inheriting pool option

2017-12-12 Thread Stefan Kooman
Quoting Nick Fisk (n...@fisk.me.uk):
> Hi All,
> 
> Has anyone been testing the bluestore pool compression option?
> 
> I have set compression=snappy on a RBD pool. When I add a new bluestore OSD,
> data is not being compressed when backfilling, confirmed by looking at the
> perf dump results. If I then set again the compression type on the pool to
> snappy, then immediately data starts getting compressed. It seems like when
> a new OSD joins the cluster, it doesn't pick up the existing compression
> setting on the pool.
> 
> Anyone seeing anything similar? I will raise a bug if anyone can confirm.

Yes. I tried to reproduce your issue and I'm seeing the same thing. The
things I did to reproduce:

- check for compressed objects beforehand on osd (no compressed objects
  where there)

- remove one of the osds in the cluster

- ceph osd pool set CEPH-TEST-ONE compression_algorithm snappy
- ceph osd pool set CEPH-TEST-ONE compression_mode force
-  rbd clone a rbd image
- let cluster heal again
- check for compressed bluestore objects on "new" osd (ceph daemon osd.0 perf
dump | grep blue): 

"bluestore_compressed": 0,
"bluestore_compressed_allocated": 0,
"bluestore_compressed_original": 0

- check for compressed bluestore objects on already existing osd (ceph daemon
osd.1 perf dump | grep blue):

"bluestore_compressed": 2991873,
"bluestore_compressed_allocated": 3637248,
"bluestore_compressed_original": 10895360,

Gr. Stefan

-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Odd object blocking IO on PG

2017-12-12 Thread Nick Fisk
 
> That doesn't look like an RBD object -- any idea who is
> "client.34720596.1:212637720"?

So I think these might be proxy ops from the cache tier, as there are also
block ops on one of the cache tier OSD's, but this time it actually lists
the object name. Block op on cache tier.

   "description": "osd_op(client.34720596.1:212637720 17.ae78c1cf
17:f3831e75:::rbd_data.15a5e20238e1f29.000388ad:head [set-alloc-hint
object_size 4194304 write_size 4194304,write 2584576~16384] snapc 0=[]
RETRY=2 ondisk+retry+write+known_if_redirected e104841)",
"initiated_at": "2017-12-12 16:25:32.435718",
"age": 13996.681147,
"duration": 13996.681203,
"type_data": {
"flag_point": "reached pg",
"client_info": {
"client": "client.34720596",
"client_addr": "10.3.31.41:0/2600619462",
"tid": 212637720

I'm a bit baffled at the moment what's going. The pg query (attached) is not
showing in the main status that it has been blocked from peering or that
there are any missing objects. I've tried restarting all OSD's I can see
relating to the PG in case they needed a bit of a nudge.

> 
> On Tue, Dec 12, 2017 at 12:36 PM, Nick Fisk  wrote:
> > Does anyone know what this object (0.ae78c1cf) might be, it's not your
> > normal run of the mill RBD object and I can't seem to find it in the
> > pool using rados --all ls . It seems to be leaving the 0.1cf PG stuck
> > in an
> > activating+remapped state and blocking IO. Pool 0 is just a pure RBD
> > activating+pool
> > with a cache tier above it. There is no current mention of unfound
> > objects or any other obvious issues.
> >
> > There is some backfilling going on, on another OSD which was upgraded
> > to bluestore, which was when the issue started. But I can't see any
> > link in the PG dump with upgraded OSD. My only thought so far is to
> > wait for this backfilling to finish and then deep-scrub this PG and
> > see if that reveals anything?
> >
> > Thanks,
> > Nick
> >
> >  "description": "osd_op(client.34720596.1:212637720 0.1cf 0.ae78c1cf
> > (undecoded)
> > ondisk+retry+write+ignore_cache+ignore_overlay+known_if_redirected
> > e105014)",
> > "initiated_at": "2017-12-12 17:10:50.030660",
> > "age": 335.948290,
> > "duration": 335.948383,
> > "type_data": {
> > "flag_point": "delayed",
> > "events": [
> > {
> > "time": "2017-12-12 17:10:50.030660",
> > "event": "initiated"
> > },
> > {
> > "time": "2017-12-12 17:10:50.030692",
> > "event": "queued_for_pg"
> > },
> > {
> > "time": "2017-12-12 17:10:50.030719",
> > "event": "reached_pg"
> > },
> > {
> > "time": "2017-12-12 17:10:50.030727",
> > "event": "waiting for peered"
> > },
> > {
> > "time": "2017-12-12 17:10:50.197353",
> > "event": "reached_pg"
> > },
> > {
> > "time": "2017-12-12 17:10:50.197355",
> > "event": "waiting for peered"
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> --
> Jason
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
{
"state": "activating+remapped",
"snap_trimq": "[]",
"epoch": 105385,
"up": [
34,
68,
8
],
"acting": [
34,
8,
53
],
"backfill_targets": [
"68"
],
"actingbackfill": [
"8",
"34",
"53",
"68"
],
"info": {
"pgid": "0.1cf",
"last_update": "104744'7509486",
"last_complete": "104744'7509486",
"log_tail": "104532'7507938",
"last_user_version": 132304619,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": [
{
"start": "1",
"length": "d"
}
],
"history": {
"epoch_created": 1,
"epoch_pool_created": 1,
"last_epoch_started": 104744,
"last_interval_started": 104743,
"last_epoch_clean": 104744,
"last_interval_clean": 104743,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 105371,
"same_interval_since": 105371,
 

Re: [ceph-users] Error in osd_client.c, request_reinit

2017-12-12 Thread Ilya Dryomov
On Tue, Dec 12, 2017 at 8:18 PM, fcid  wrote:
> Hello everyone,
>
> We had an incident regarding a client which reboot after experiencing some
> issues with a ceph cluster.
>
> The other clients who consume RBD images from the same ceph cluster showed
> and error at the time of the reboot in logs related to libceph.
>
> The errors looks like this:
>
> Dec 10 21:29:52  kernel: [5830277.680860] WARNING: CPU: 15 PID: 8113 at
> net/ceph/osd_client.c:490 request_reinit+0x141/0x180 [libceph]
> Dec 10 21:29:52  kernel: [5830277.691032] Modules linked in:
> nfnetlink_queue bluetooth ocfs2 quota_tree binfmt_misc tcp_diag inet_diag
> veth ip_set ip6table_filter ip6_tables xt_nat xt_tcpudp xt_multiport
> xt_conntrack xt_addrtype iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
> nf_nat_ipv4 nf_nat nf_conntrack rbd libceph ocfs2_dlmfs ocfs2_stack_o2cb
> ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue bonding softdog iptable_filter
> nfnetlink_log nfnetlink intel_rapl sb_edac edac_core x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul
> crc32_pclmul ipmi_ssif ghash_clmulni_intel pcbc ast ttm aesni_intel
> aes_x86_64 drm_kms_helper crypto_simd glue_helper snd_pcm cryptd drm
> snd_timer snd fb_sys_fops intel_cstate syscopyarea input_leds soundcore
> joydev sysfillrect intel_rapl_perf sysimgblt mei_me pcspkr mei ioatdma
> Dec 10 21:29:52  kernel: [5830277.765547]  lpc_ich shpchp wmi ipmi_si
> ipmi_devintf ipmi_msghandler nfit acpi_pad acpi_power_meter mac_hid
> vhost_net vhost macvtap macvlan ib_iser rdma_cm iw_cm ib_cm ib_core configfs
> iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables
> x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq
> async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear raid1
> hid_generic usbkbd usbmouse usbhid hid igb ixgbe i2c_algo_bit dca ptp ahci
> pps_core i2c_i801 libahci mdio fjes [last unloaded: quota_tree]
> Dec 10 21:29:52  kernel: [5830277.816275] CPU: 15 PID: 8113 Comm:
> kworker/15:0 Tainted: GW 4.10.17-1-pve #1
> Dec 10 21:29:52  kernel: [5830277.825564] Hardware name: Supermicro
> SYS-1028U-TR4T+/X10DRU-i+, BIOS 2.0c 04/21/2017
> Dec 10 21:29:52  kernel: [5830277.834272] Workqueue: events
> handle_timeout [libceph]
> Dec 10 21:29:52  kernel: [5830277.840307] Call Trace:
> Dec 10 21:29:52  kernel: [5830277.843620] dump_stack+0x63/0x81
> Dec 10 21:29:52  kernel: [5830277.847846] __warn+0xcb/0xf0
> Dec 10 21:29:52  kernel: [5830277.851758] warn_slowpath_null+0x1d/0x20
> Dec 10 21:29:52  kernel: [5830277.856798] request_reinit+0x141/0x180
> [libceph]
> Dec 10 21:29:52  kernel: [5830277.862403] handle_timeout+0x307/0x5b0
> [libceph]
> Dec 10 21:29:52  kernel: [5830277.868116] process_one_work+0x1fc/0x4b0
> Dec 10 21:29:52  kernel: [5830277.873069] worker_thread+0x4b/0x500
> Dec 10 21:29:52  kernel: [5830277.877561] kthread+0x109/0x140
> Dec 10 21:29:52  kernel: [5830277.881720]  ?
> process_one_work+0x4b0/0x4b0
> Dec 10 21:29:52  kernel: [5830277.886851]  ?
> kthread_create_on_node+0x60/0x60
> Dec 10 21:29:52  kernel: [5830277.892323] ret_from_fork+0x2c/0x40
> Dec 10 21:29:52  kernel: [5830277.896939] ---[ end trace
> afd30825d5ecd451 ]---
>
> I wonder if this is a bug in KRBD.

This warning indicates a fairly minor issue in the internals of the
kernel client that should be safe to ignore.  request_reinit function
isn't used anywhere in the data or other critical paths.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Sudden omap growth on some OSDs

2017-12-12 Thread Gregory Farnum
On Tue, Dec 12, 2017 at 3:16 AM  wrote:

>
> On 11 Dec 2017, at 18:24, Gregory Farnum  gfar...@redhat.com>> wrote:
>
> Hmm, this does all sound odd. Have you tried just restarting the primary
> OSD yet? That frequently resolves transient oddities like this.
> If not, I'll go poke at the kraken source and one of the developers more
> familiar with the recovery processes we're seeing here.
> -Greg
>
>
> Hi Greg,
>
> I’ve tried this, no effect. Also, on Friday, we tried removing an OSD (not
> the primary), the OSD that was chosen to replace it had it’s LevelDB grow
> to 7GiB by now. Yesterday it was 5.3.
> We’re not seeing any errors logged by the OSDs with the default logging
> level either.
>
> Do you have any comments on the fact that the primary sees the PG’s state
> as being different to what the peers think?
>

Yes. It's super weird. :p


> Now, with a new primary I’m seeing the last peer in the set reporting it’s
> ‘active+clean’, as is the primary, all other are saying it’s
> ‘active+clean+degraded’ (according to PG query output).
>

Has the last OSD in the list shrunk down its LevelDB instance?

If so (or even if not), I'd try restarting all the OSDs in the PG and see
if that changes things.

If it doesn't...well, it's about to be Christmas and Luminous saw quite a
bit of change in this space, so it's unlikely to get a lot of attention. :/
But the next step would be to gather high-level debug logs from the OSDs in
question, especially as a peering action takes place.

Oh!
I didn't notice you previously mentioned "custom gateways using the
libradosstriper". Are those backing onto this pool? What operations are
they doing?
Something like repeated overwrites of EC data could definitely have
symptoms similar to this (apart from the odd peering bit.)
-Greg



>
> This problem is quite weird I think. I copied a LevelDB and dumped a key
> list; the largest in GiB had 66% the number of keys that the average
> LevelDB has. The main difference with the ones that have been around for a
> while is that they have a lot more files that were last touched on the days
> when the problem started but most other LevelDBs have compacted those away
> and only have about 7 days old files (as opposed to 3 week old ones that
> the big ones keep around). The big ones do seem to do compactions, they
> just don’t seem to get rid of that stuff.
>
>
>
> Thanks,
>
> George
>
>
> On Fri, Dec 8, 2017 at 7:30 AM  george.vasilaka...@stfc.ac.uk>> wrote:
>
> 
> From: Gregory Farnum [gfar...@redhat.com]
> Sent: 07 December 2017 21:57
> To: Vasilakakos, George (STFC,RAL,SC)
> Cc: drakonst...@gmail.com;
> ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Sudden omap growth on some OSDs
>
>
>
> On Thu, Dec 7, 2017 at 4:41 AM  george.vasilaka...@stfc.ac.uk> >> wrote:
>
> 
> From: Gregory Farnum [gfar...@redhat.com >>]
> Sent: 06 December 2017 22:50
> To: David Turner
> Cc: Vasilakakos, George (STFC,RAL,SC); ceph-users@lists.ceph.com ceph-users@lists.ceph.com>>
> Subject: Re: [ceph-users] Sudden omap growth on some OSDs
>
> On Wed, Dec 6, 2017 at 2:35 PM David Turner  drakonst...@gmail.com>> I have no proof or anything other than a hunch, but OSDs don't trim omaps
> unless all PGs are healthy.  If this PG is actually not healthy, but the
> cluster doesn't realize it while these 11 involved OSDs do realize that the
> PG is unhealthy... You would see this exact problem.  The OSDs think a PG
> is unhealthy so they aren't trimming their omaps while the cluster doesn't
> seem to be aware of it and everything else is trimming their omaps properly.
>
> I think you're confusing omaps and OSDMaps here. OSDMaps, like omap, are
> stored in leveldb, but they have different trimming rules.
>
>
> I don't know what to do about it, but I hope it helps get you (or someone
> else on the ML) towards a resolution.
>
> On Wed, Dec 6, 2017 at 1:59 PM  george.vasilaka...@stfc.ac.uk> > george.vasilaka...@stfc.ac.uk > Hi ceph-users,
>
> We have a Ceph cluster (running Kraken) that is exhibiting some odd
> behaviour.
> A couple weeks ago, the LevelDBs on some our OSDs started growing large
> (now at around 20G size).
>
> The one thing they have in common is the 11 disks with inflating LevelDBs
> are all in the set for one PG in one of our po

[ceph-users] Error in osd_client.c, request_reinit

2017-12-12 Thread fcid

Hello everyone,

We had an incident regarding a client which reboot after experiencing 
some issues with a ceph cluster.


The other clients who consume RBD images from the same ceph cluster 
showed and error at the time of the reboot in logs related to libceph.


The errors looks like this:

Dec 10 21:29:52  kernel: [5830277.680860] WARNING: CPU: 15 PID: 8113 
at net/ceph/osd_client.c:490 request_reinit+0x141/0x180 [libceph]
Dec 10 21:29:52  kernel: [5830277.691032] Modules linked in: 
nfnetlink_queue bluetooth ocfs2 quota_tree binfmt_misc tcp_diag 
inet_diag veth ip_set ip6table_filter ip6_tables xt_nat xt_tcpudp 
xt_multiport xt_conntrack xt_addrtype iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack rbd libceph ocfs2_dlmfs 
ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue bonding 
softdog iptable_filter nfnetlink_log nfnetlink intel_rapl sb_edac 
edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm 
irqbypass crct10dif_pclmul crc32_pclmul ipmi_ssif ghash_clmulni_intel 
pcbc ast ttm aesni_intel aes_x86_64 drm_kms_helper crypto_simd 
glue_helper snd_pcm cryptd drm snd_timer snd fb_sys_fops intel_cstate 
syscopyarea input_leds soundcore joydev sysfillrect intel_rapl_perf 
sysimgblt mei_me pcspkr mei ioatdma
Dec 10 21:29:52  kernel: [5830277.765547]  lpc_ich shpchp wmi 
ipmi_si ipmi_devintf ipmi_msghandler nfit acpi_pad acpi_power_meter 
mac_hid vhost_net vhost macvtap macvlan ib_iser rdma_cm iw_cm ib_cm 
ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
sunrpc ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 
multipath linear raid1 hid_generic usbkbd usbmouse usbhid hid igb ixgbe 
i2c_algo_bit dca ptp ahci pps_core i2c_i801 libahci mdio fjes [last 
unloaded: quota_tree]
Dec 10 21:29:52  kernel: [5830277.816275] CPU: 15 PID: 8113 Comm: 
kworker/15:0 Tainted: G    W 4.10.17-1-pve #1
Dec 10 21:29:52  kernel: [5830277.825564] Hardware name: Supermicro 
SYS-1028U-TR4T+/X10DRU-i+, BIOS 2.0c 04/21/2017
Dec 10 21:29:52  kernel: [5830277.834272] Workqueue: events 
handle_timeout [libceph]

Dec 10 21:29:52  kernel: [5830277.840307] Call Trace:
Dec 10 21:29:52  kernel: [5830277.843620] dump_stack+0x63/0x81
Dec 10 21:29:52  kernel: [5830277.847846] __warn+0xcb/0xf0
Dec 10 21:29:52  kernel: [5830277.851758] warn_slowpath_null+0x1d/0x20
Dec 10 21:29:52  kernel: [5830277.856798] request_reinit+0x141/0x180 
[libceph]
Dec 10 21:29:52  kernel: [5830277.862403] handle_timeout+0x307/0x5b0 
[libceph]

Dec 10 21:29:52  kernel: [5830277.868116] process_one_work+0x1fc/0x4b0
Dec 10 21:29:52  kernel: [5830277.873069] worker_thread+0x4b/0x500
Dec 10 21:29:52  kernel: [5830277.877561] kthread+0x109/0x140
Dec 10 21:29:52  kernel: [5830277.881720]  ? 
process_one_work+0x4b0/0x4b0
Dec 10 21:29:52  kernel: [5830277.886851]  ? 
kthread_create_on_node+0x60/0x60

Dec 10 21:29:52  kernel: [5830277.892323] ret_from_fork+0x2c/0x40
Dec 10 21:29:52  kernel: [5830277.896939] ---[ end trace 
afd30825d5ecd451 ]---


I wonder if this is a bug in KRBD.

We are using ceph 10.2.5 in the ceph clients, but our ceph cluster is 
10.2.9.


Please let me know if you need more information about our environment,

Kind regards,

--

Fernando Cid O.
Ingeniero de Operaciones
AltaVoz S.A.
 http://www.altavoz.net
Viña del Mar, Valparaiso:
 2 Poniente 355 of 53
 +56 32 276 8060
Santiago:
 San Pío X 2460, oficina 304, Providencia
 +56 2 2585 4264

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Odd object blocking IO on PG

2017-12-12 Thread Gregory Farnum
Jason was more diligent than me and dug enough to realize that we print out
the "raw pg", which we are printing out because we haven't gotten far
enough in the pipeline to decode the actual object name. You'll note that
it ends with the same characters as the PG does, and unlike a pgid, the raw
pg is a full-length hash and resistant to operations like pg split. :)
-Greg

On Tue, Dec 12, 2017 at 11:12 AM Jason Dillaman  wrote:

> That doesn't look like an RBD object -- any idea who is
> "client.34720596.1:212637720"?
>
> On Tue, Dec 12, 2017 at 12:36 PM, Nick Fisk  wrote:
> > Does anyone know what this object (0.ae78c1cf) might be, it's not your
> > normal run of the mill RBD object and I can't seem to find it in the pool
> > using rados --all ls . It seems to be leaving the 0.1cf PG stuck in an
> > activating+remapped state and blocking IO. Pool 0 is just a pure RBD pool
> > with a cache tier above it. There is no current mention of unfound
> objects
> > or any other obvious issues.
> >
> > There is some backfilling going on, on another OSD which was upgraded to
> > bluestore, which was when the issue started. But I can't see any link in
> the
> > PG dump with upgraded OSD. My only thought so far is to wait for this
> > backfilling to finish and then deep-scrub this PG and see if that reveals
> > anything?
> >
> > Thanks,
> > Nick
> >
> >  "description": "osd_op(client.34720596.1:212637720 0.1cf 0.ae78c1cf
> > (undecoded)
> > ondisk+retry+write+ignore_cache+ignore_overlay+known_if_redirected
> > e105014)",
> > "initiated_at": "2017-12-12 17:10:50.030660",
> > "age": 335.948290,
> > "duration": 335.948383,
> > "type_data": {
> > "flag_point": "delayed",
> > "events": [
> > {
> > "time": "2017-12-12 17:10:50.030660",
> > "event": "initiated"
> > },
> > {
> > "time": "2017-12-12 17:10:50.030692",
> > "event": "queued_for_pg"
> > },
> > {
> > "time": "2017-12-12 17:10:50.030719",
> > "event": "reached_pg"
> > },
> > {
> > "time": "2017-12-12 17:10:50.030727",
> > "event": "waiting for peered"
> > },
> > {
> > "time": "2017-12-12 17:10:50.197353",
> > "event": "reached_pg"
> > },
> > {
> > "time": "2017-12-12 17:10:50.197355",
> > "event": "waiting for peered"
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Jason
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Odd object blocking IO on PG

2017-12-12 Thread Jason Dillaman
That doesn't look like an RBD object -- any idea who is
"client.34720596.1:212637720"?

On Tue, Dec 12, 2017 at 12:36 PM, Nick Fisk  wrote:
> Does anyone know what this object (0.ae78c1cf) might be, it's not your
> normal run of the mill RBD object and I can't seem to find it in the pool
> using rados --all ls . It seems to be leaving the 0.1cf PG stuck in an
> activating+remapped state and blocking IO. Pool 0 is just a pure RBD pool
> with a cache tier above it. There is no current mention of unfound objects
> or any other obvious issues.
>
> There is some backfilling going on, on another OSD which was upgraded to
> bluestore, which was when the issue started. But I can't see any link in the
> PG dump with upgraded OSD. My only thought so far is to wait for this
> backfilling to finish and then deep-scrub this PG and see if that reveals
> anything?
>
> Thanks,
> Nick
>
>  "description": "osd_op(client.34720596.1:212637720 0.1cf 0.ae78c1cf
> (undecoded)
> ondisk+retry+write+ignore_cache+ignore_overlay+known_if_redirected
> e105014)",
> "initiated_at": "2017-12-12 17:10:50.030660",
> "age": 335.948290,
> "duration": 335.948383,
> "type_data": {
> "flag_point": "delayed",
> "events": [
> {
> "time": "2017-12-12 17:10:50.030660",
> "event": "initiated"
> },
> {
> "time": "2017-12-12 17:10:50.030692",
> "event": "queued_for_pg"
> },
> {
> "time": "2017-12-12 17:10:50.030719",
> "event": "reached_pg"
> },
> {
> "time": "2017-12-12 17:10:50.030727",
> "event": "waiting for peered"
> },
> {
> "time": "2017-12-12 17:10:50.197353",
> "event": "reached_pg"
> },
> {
> "time": "2017-12-12 17:10:50.197355",
> "event": "waiting for peered"
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Odd object blocking IO on PG

2017-12-12 Thread Gregory Farnum
On Tue, Dec 12, 2017 at 9:37 AM Nick Fisk  wrote:

> Does anyone know what this object (0.ae78c1cf) might be, it's not your
> normal run of the mill RBD object and I can't seem to find it in the pool
> using rados --all ls . It seems to be leaving the 0.1cf PG stuck in an
> activating+remapped state and blocking IO. Pool 0 is just a pure RBD pool
> with a cache tier above it. There is no current mention of unfound objects
> or any other obvious issues.
>
> There is some backfilling going on, on another OSD which was upgraded to
> bluestore, which was when the issue started. But I can't see any link in
> the
> PG dump with upgraded OSD. My only thought so far is to wait for this
> backfilling to finish and then deep-scrub this PG and see if that reveals
> anything?
>
> Thanks,
> Nick
>
>  "description": "osd_op(client.34720596.1:212637720 0.1cf 0.ae78c1cf
> (undecoded)
> ondisk+retry+write+ignore_cache+ignore_overlay+known_if_redirected
> e105014)",
> "initiated_at": "2017-12-12 17:10:50.030660",
> "age": 335.948290,
> "duration": 335.948383,
> "type_data": {
> "flag_point": "delayed",
> "events": [
> {
> "time": "2017-12-12 17:10:50.030660",
> "event": "initiated"
> },
> {
> "time": "2017-12-12 17:10:50.030692",
> "event": "queued_for_pg"
> },
> {
> "time": "2017-12-12 17:10:50.030719",
> "event": "reached_pg"
> },
> {
> "time": "2017-12-12 17:10:50.030727",
> "event": "waiting for peered"
> },
> {
> "time": "2017-12-12 17:10:50.197353",
> "event": "reached_pg"
> },
> {
> "time": "2017-12-12 17:10:50.197355",
> "event": "waiting for peered"
>
>
Is there some other evidence this object is the one causing the PG to be
stuck? This trace is just what you get when a PG isn't peering and has
nothing to do with the object involved. You'll need to figure out what is
keeping the PG from peering.

(PG listing operations also require an active PG, so I expect "rados ls" is
just skipping that PG — though I'm surprised it doesn't throw a warning or
error.)
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Using CephFS in LXD containers

2017-12-12 Thread David Turner
We have a project using cephfs (ceph-fuse) in kubernetes containers.  For
us the throughput was limited by the mount point and not the cluster.
Having a single mount point for each container would cap with the
throughput of a single mount point.  We ended up mounting cephfs inside of
the containers.  The initial reason we used kubernetes for cephfs was
multi-tenancy benchmarking and we found that a single mount point vs 20
mount points all had the same throughput for our infrastructure (so 20
mounts points was 20x more throughput than 1 mount point).  It wasn't until
we got up to about 100 concurrent mount points that we capped our
throughput, but our total throughput just kept going up the more mount
points we had of ceph-fuse for cephfs.

On Tue, Dec 12, 2017 at 12:06 PM Bogdan SOLGA 
wrote:

> Hello, everyone!
>
> We have recently started to use CephFS (Jewel, v12.2.1) from a few LXD
> containers. We have mounted it on the host servers and then exposed it in
> the LXD containers.
>
> Do you have any recommendations (dos and don'ts) on this way of using
> CephFS?
>
> Thank you, in advance!
>
> Kind regards,
> Bogdan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bluestore Compression not inheriting pool option

2017-12-12 Thread Nick Fisk
Hi All,

Has anyone been testing the bluestore pool compression option?

I have set compression=snappy on a RBD pool. When I add a new bluestore OSD,
data is not being compressed when backfilling, confirmed by looking at the
perf dump results. If I then set again the compression type on the pool to
snappy, then immediately data starts getting compressed. It seems like when
a new OSD joins the cluster, it doesn't pick up the existing compression
setting on the pool.

Anyone seeing anything similar? I will raise a bug if anyone can confirm.

Nick

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Odd object blocking IO on PG

2017-12-12 Thread Nick Fisk
Does anyone know what this object (0.ae78c1cf) might be, it's not your
normal run of the mill RBD object and I can't seem to find it in the pool
using rados --all ls . It seems to be leaving the 0.1cf PG stuck in an
activating+remapped state and blocking IO. Pool 0 is just a pure RBD pool
with a cache tier above it. There is no current mention of unfound objects
or any other obvious issues.  

There is some backfilling going on, on another OSD which was upgraded to
bluestore, which was when the issue started. But I can't see any link in the
PG dump with upgraded OSD. My only thought so far is to wait for this
backfilling to finish and then deep-scrub this PG and see if that reveals
anything?

Thanks,
Nick

 "description": "osd_op(client.34720596.1:212637720 0.1cf 0.ae78c1cf
(undecoded)
ondisk+retry+write+ignore_cache+ignore_overlay+known_if_redirected
e105014)",
"initiated_at": "2017-12-12 17:10:50.030660",
"age": 335.948290,
"duration": 335.948383,
"type_data": {
"flag_point": "delayed",
"events": [
{
"time": "2017-12-12 17:10:50.030660",
"event": "initiated"
},
{
"time": "2017-12-12 17:10:50.030692",
"event": "queued_for_pg"
},
{
"time": "2017-12-12 17:10:50.030719",
"event": "reached_pg"
},
{
"time": "2017-12-12 17:10:50.030727",
"event": "waiting for peered"
},
{
"time": "2017-12-12 17:10:50.197353",
"event": "reached_pg"
},
{
"time": "2017-12-12 17:10:50.197355",
"event": "waiting for peered"

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Using CephFS in LXD containers

2017-12-12 Thread Bogdan SOLGA
Hello, everyone!

We have recently started to use CephFS (Jewel, v12.2.1) from a few LXD
containers. We have mounted it on the host servers and then exposed it in
the LXD containers.

Do you have any recommendations (dos and don'ts) on this way of using
CephFS?

Thank you, in advance!

Kind regards,
Bogdan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: Lock doesn't want to be given up

2017-12-12 Thread Florian Margaine
Hi,

We're hitting an odd issue on our ceph cluster:

- We have machine1 mapping an exclusive-lock RBD.
- Machine2 wants to take a snapshot of the RBD, but fails to take the lock.

Stracing the rbd snap process on machine2 shows it looping on sending
"lockget" commands, without ever moving forward.

In rbd status, we see that machine1 is a watcher on the image, which is
expected. What is not expected is that the rbd snap process can't get the
lock.

This commit deployed in 10.2.10, which we are using, sounds related:
https://github.com/ceph/ceph/commit/475dda114a7e25b43dc9066b9808a64fc0c6dc89

But there isn't the equivalent in ceph-client's code, which we would expect
too. That said, I don't have a full understanding, so I might be off-base
there.

Am I wrong in expecting the equivalent in ceph-client's code? (aka Linux
kernel) Am I completely off-base as to what is wrong there? Can I provide
any additional information to help debugging?

Regards,
Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] inconsistent pg issue with ceph version 10.2.3

2017-12-12 Thread Thanh Tran
Hi,

My ceph cluster has a inconsistent pg. I tried to deep scrub pg and repair
pg but not fix problems.
I found that the object that made pg inconsistent depends on snapshot (snap
id is 2ccac = 183468) of a image, I deleted this snapshot, then query
inconsistent pg and it showed empty, but my ceph cluster still show error
inconsistent pg. It's strange that the mismatch object depends on the
deleted snapshot still exist on disk.

After deleting the snapshot has the mismatch object, I re-tried to deep
scrub and repair pg but it doesn't help to fix.

Information of commands as below. I don't know how to clear this status of
inconsistent. Please help me to fix this bug.

*ceph --cluster QTC01 health detail*
HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
pg 4.1b42 is active+clean+inconsistent, acting [159,179]
1 scrub errors

*rados list-inconsistent-obj 4.1b42 --format=json-pretty --cluster QTC01*
{
"epoch": 494830,
"inconsistents": []
}

*ll /var/lib/ceph/osd/QTC01-159/current/4.1b42_head/DIR_2/DIR_4/DIR_B/ |
grep aa5af9238e1f29*
-rw-r--r-- 1 ceph ceph 4194304 Dec 12 15:29
rbd\udata.aa5af9238e1f29.470a__2ccac_7B3CDB42__4
-rw-r--r-- 1 ceph ceph 1048576 Dec 12 15:29
rbd\udata.aa5af9238e1f29.470a__head_7B3CDB42__4

*ll /var/lib/ceph/osd/QTC01-179/current/4.1b42_head/DIR_2/DIR_4/DIR_B/ |
grep aa5af9238e1f29*
-rw-r--r-- 1 ceph ceph 4194304 Dec 12 15:17
rbd\udata.aa5af9238e1f29.470a__2ccac_7B3CDB42__4
-rw-r--r-- 1 ceph ceph 1048576 Dec 12 15:17
rbd\udata.aa5af9238e1f29.470a__head_7B3CDB42__4

Best regards,
Thanh Tran
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph configuration backup - what is vital?

2017-12-12 Thread Wido den Hollander



On 12/12/2017 02:18 PM, David Turner wrote:
I always back up my crush map. Someone making a mistake to the crush map 
will happen and being able to restore last night's crush map has been 
wonderful. That's all I really back up.




Yes, that's what I would suggest as well. Just have a daily CRON or so 
dumping the CRUSHMap and backing it up somewhere.


Should some tool or person screw it all up you can just revert to a 
older map.


Yes, you can fetch the older map from the Monitors as well, but I just 
like to keep it externally.


Wido



On Tue, Dec 12, 2017, 5:53 AM Wolfgang Lendl 
> wrote:


hello,

I'm looking for a recommendation about what parts/configuration/etc to
backup from a ceph cluster in case of a disaster.
I know this depends heavily on the type of disaster and I'm not talking
about backup of payload stored on osds.

currently I have my admin key stored somewhere outside the cluster -
maybe there are some best practices out there?


wolfgang

___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph configuration backup - what is vital?

2017-12-12 Thread David Turner
I always back up my crush map. Someone making a mistake to the crush map
will happen and being able to restore last night's crush map has been
wonderful. That's all I really back up.

On Tue, Dec 12, 2017, 5:53 AM Wolfgang Lendl <
wolfgang.le...@meduniwien.ac.at> wrote:

> hello,
>
> I'm looking for a recommendation about what parts/configuration/etc to
> backup from a ceph cluster in case of a disaster.
> I know this depends heavily on the type of disaster and I'm not talking
> about backup of payload stored on osds.
>
> currently I have my admin key stored somewhere outside the cluster -
> maybe there are some best practices out there?
>
>
> wolfgang
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow objects deletion

2017-12-12 Thread David Turner
To delete objects quickly, I set up a multi-threaded python script, but
then I learned about the --bypass-gc so I've been trying to use that
instead of putting all of the object into the GC to be deleted. Deleting
using radosgw-admin is not multi-threaded.

On Tue, Dec 12, 2017, 5:43 AM Rafał Wądołowski 
wrote:

> Hi,
>
> Is there any known fast procedure to delete objects in large buckets? I
> have about 40 milions of objects. I used:
>
> radosgw-admin bucket rm --bucket=bucket-3 --purge-objects
>
> but it is very slow. I am using ceph luminous (12.2.1).
>
> Is it working in parallel?
>
> --
> BR,
>
> Rafał Wądołowski
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Production 12.2.2 CephFS Cluster still broken, new Details

2017-12-12 Thread Tobias Prousa

Thank you very much! I feel optimistic that now I got what I need to get that 
thing back working again.

I'll report back...

Best regards,
Tobi



On 12/12/2017 02:08 PM, Yan, Zheng wrote:

On Tue, Dec 12, 2017 at 8:29 PM, Tobias Prousa  wrote:

Hi Zheng,

the more you tell me the more what I see begins to makes sens to me. Thank
you very much.

Could you please be a little more verbose about how to use rados rmomapky?
What to use for  and what to use for <>. Here is what my dir_frag
looks like:

 {
 "damage_type": "dir_frag",
 "id": 1418581248,
 "ino": 1099733590290,
 "frag": "*",
 "path":
"/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing-backup"
 }


Find inode number of parent directory
(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/ in this
case), print it in hex. You will get  something like 1000xxx.

run 'rados -p cephfs_metadatapool listomapkeys 1000xxx.'

the output should include one entry named safebrowsing-backup_head

run 'rados -p cephfs_metadatapool rmomapkey 1000xxx.
safebrowsing-backup_head'

before doing rmomapkey, run 'ceph daemon mds.x flush journal'  and
stop mds. you'd better to do this after scrub


I cannot simply remove that dir through filesystem as it refuses to delete
that folder.

Then you say its easy to fix backtrace, yet here it looks like some
backtraces get fixed with online MDS scrub while most of them fail to be
fixed and stay in damage_type "backtrace".

Once again, thank you so much for your help!

Best regards,
Tobi




On 12/12/2017 01:10 PM, Yan, Zheng wrote:

On Tue, Dec 12, 2017 at 4:22 PM, Tobias Prousa 
wrote:

Hi there,

regarding my ML post from yesterday (Upgrade from 12.2.1 to 12.2.2 broke
my
CephFs) I was able to get a little further with the suggested
"cephfs-table-tool take_inos ". This made the whole issue with
loads of "falsely free-marked inodes" go away.

I then restarted MDS, kept all clients down so no client has mounted FS.
Then I started an online MDS scrub

ceph daemon mds.a scrub_path / recursive repair

This again ran for about 3 hours, then MDS again marked FS damaged and
changes its own state to standby (at least that is what I interpret from
what I see. This happened exactly at the moment when the scrub hit a
missing
object. See end of logfile (default log level):

2017-12-11 22:29:05.725484 7fc2342bc700  0 log_channel(cluster) log [WRN]
:
bad backtrace on inode

0x1000d3aede3(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore),
rewriting it
2017-12-11 22:29:05.725507 7fc2342bc700  0 log_channel(cluster) log [WRN]
:
Scrub error on inode 0x1000d3aede3

(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore)
see mds.b log and `damage ls` output for details
2017-12-11 22:29:05.725569 7fc2342bc700 -1 mds.0.scrubstack
_validate_inode_done scrub error on inode [inode 0x1000d3aede3 [2,head]

/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore
auth v382 dirtyparent s=232 n(v0 b232 1=1+0) (iversion lock) |
dirtyparent=1
scrubqueue=0 0x55ef37c83200]:

{"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(0)0x1000d3aede3:[<0x1000d3aeda7/test-unwanted-simple.sbstore
v382>,<0x10002de79e8/safebrowsing
v7142119>,<0x10002de79df/dsjf5siv.default
v4089757>,<0x10002de79de/firefox v3998050>,<0x10002de79dd/mozilla
v4933047>,<0x100018bd837/.cache v115551644>,<0x100/some_username
v444724510>,<0x1/home v228039388>]//","error_str":"failed to read off
disk;
see

retval"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-61}
2017-12-11 22:29:05.729992 7fc2342bc700  0 log_channel(cluster) log [WRN]
:
bad backtrace on inode

0x1000d3aedf1(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore),
rewriting it
2017-12-11 22:29:05.730022 7fc2342bc700  0 log_channel(cluster) log [WRN]
:
Scrub error on inode 0x1000d3aedf1

(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore)
see mds.b log and `damage ls` output for details
2017-12-11 22:29:05.730077 7fc2342bc700 -1 mds.0.scrubstack
_validate_inode_done scrub error on inode [inode 0x1000d3aedf1 [2,head]

/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore
auth v384 dirtyparent s=232 n(v0 b232 1=1+0) (iversion lock) |
dirtyparent=1
scrubqueue=0 0x55ef3aa38a00]:

{"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(0)0x100

Re: [ceph-users] Production 12.2.2 CephFS Cluster still broken, new Details

2017-12-12 Thread Yan, Zheng
On Tue, Dec 12, 2017 at 8:29 PM, Tobias Prousa  wrote:
> Hi Zheng,
>
> the more you tell me the more what I see begins to makes sens to me. Thank
> you very much.
>
> Could you please be a little more verbose about how to use rados rmomapky?
> What to use for  and what to use for <>. Here is what my dir_frag
> looks like:
>
> {
> "damage_type": "dir_frag",
> "id": 1418581248,
> "ino": 1099733590290,
> "frag": "*",
> "path":
> "/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing-backup"
> }


Find inode number of parent directory
(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/ in this
case), print it in hex. You will get  something like 1000xxx.

run 'rados -p cephfs_metadatapool listomapkeys 1000xxx.'

the output should include one entry named safebrowsing-backup_head

run 'rados -p cephfs_metadatapool rmomapkey 1000xxx.
safebrowsing-backup_head'

before doing rmomapkey, run 'ceph daemon mds.x flush journal'  and
stop mds. you'd better to do this after scrub

>
> I cannot simply remove that dir through filesystem as it refuses to delete
> that folder.
>
> Then you say its easy to fix backtrace, yet here it looks like some
> backtraces get fixed with online MDS scrub while most of them fail to be
> fixed and stay in damage_type "backtrace".
>
> Once again, thank you so much for your help!
>
> Best regards,
> Tobi
>
>
>
>
> On 12/12/2017 01:10 PM, Yan, Zheng wrote:
>>
>> On Tue, Dec 12, 2017 at 4:22 PM, Tobias Prousa 
>> wrote:
>>>
>>> Hi there,
>>>
>>> regarding my ML post from yesterday (Upgrade from 12.2.1 to 12.2.2 broke
>>> my
>>> CephFs) I was able to get a little further with the suggested
>>> "cephfs-table-tool take_inos ". This made the whole issue with
>>> loads of "falsely free-marked inodes" go away.
>>>
>>> I then restarted MDS, kept all clients down so no client has mounted FS.
>>> Then I started an online MDS scrub
>>>
>>> ceph daemon mds.a scrub_path / recursive repair
>>>
>>> This again ran for about 3 hours, then MDS again marked FS damaged and
>>> changes its own state to standby (at least that is what I interpret from
>>> what I see. This happened exactly at the moment when the scrub hit a
>>> missing
>>> object. See end of logfile (default log level):
>>>
>>> 2017-12-11 22:29:05.725484 7fc2342bc700  0 log_channel(cluster) log [WRN]
>>> :
>>> bad backtrace on inode
>>>
>>> 0x1000d3aede3(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore),
>>> rewriting it
>>> 2017-12-11 22:29:05.725507 7fc2342bc700  0 log_channel(cluster) log [WRN]
>>> :
>>> Scrub error on inode 0x1000d3aede3
>>>
>>> (/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore)
>>> see mds.b log and `damage ls` output for details
>>> 2017-12-11 22:29:05.725569 7fc2342bc700 -1 mds.0.scrubstack
>>> _validate_inode_done scrub error on inode [inode 0x1000d3aede3 [2,head]
>>>
>>> /home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore
>>> auth v382 dirtyparent s=232 n(v0 b232 1=1+0) (iversion lock) |
>>> dirtyparent=1
>>> scrubqueue=0 0x55ef37c83200]:
>>>
>>> {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(0)0x1000d3aede3:[<0x1000d3aeda7/test-unwanted-simple.sbstore
>>> v382>,<0x10002de79e8/safebrowsing
>>> v7142119>,<0x10002de79df/dsjf5siv.default
>>> v4089757>,<0x10002de79de/firefox v3998050>,<0x10002de79dd/mozilla
>>> v4933047>,<0x100018bd837/.cache v115551644>,<0x100/some_username
>>> v444724510>,<0x1/home v228039388>]//","error_str":"failed to read off
>>> disk;
>>> see
>>>
>>> retval"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-61}
>>> 2017-12-11 22:29:05.729992 7fc2342bc700  0 log_channel(cluster) log [WRN]
>>> :
>>> bad backtrace on inode
>>>
>>> 0x1000d3aedf1(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore),
>>> rewriting it
>>> 2017-12-11 22:29:05.730022 7fc2342bc700  0 log_channel(cluster) log [WRN]
>>> :
>>> Scrub error on inode 0x1000d3aedf1
>>>
>>> (/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore)
>>> see mds.b log and `damage ls` output for details
>>> 2017-12-11 22:29:05.730077 7fc2342bc700 -1 mds.0.scrubstack
>>> _validate_inode_done scrub error on inode [inode 0x1000d3aedf1 [2,head]
>>>
>>> /home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore
>>> auth v384 dirtyparent s=232 n(v0 b232 1=1+0) (iversion lock) |
>>> dirtyparent=1
>>> scrubqueue=0 0x55ef3aa38a00]:
>>>
>>> {"performed_validation":true,"passed_

Re: [ceph-users] Resharding issues / How long does it take?

2017-12-12 Thread Martin Emrich

Hi!

(By the way, now a second bucket has this problem, it apparently occurs 
when the automatic resharding commences while data is being written to 
the bucket).


Am 12.12.17 um 09:53 schrieb Orit Wasserman:

On Mon, Dec 11, 2017 at 11:45 AM, Martin Emrich
 wrote:




This is after resharding the bucket?


Yes.


Which logs would be helpful?



rgw logs , if you can increase the debug level debug_rgw=20 and
debug_ms=1 that will be great.


As now a second bucket went down, I suspect I can reproduce it.
When I can, I'll collect log files.


When resharding completes it prints the old bucket instance id, you
will need to remove it.
I think this is the warning in the start of resharding I believe in
your case resharding hasn't completed.
you can get the old bucket instance id from the resharding log or the
bucket info.
Than you will need to delete it using rados command.


Indeed. But in the .data pool, there are objects from all buckets, and I 
have no idea how to identify the objects belonging to the faulty bucket.



My primary goal is now to completely remove the damaged bucket without a trace, 
but I'd also love to find out what went wrong in the first place.
Could I have a "multisite setup" without knowing it? I did not knowingly set up 
anything in this regard, It's just one cluster, with three identically configured radosgw 
behind a load balancer...


Probably not you need to setup multisite ...


Good to know ;)


As to remove the damaged bucket, if the bucket index is no consistent
you will need to manually remove it:
first unlink the bucket from the user: radosgw-admin bucket unlink

Than you will need manually remove the bucket:
1. if this is you only bucket I would go for deleting the bucket pools
and using new one
2. Try to fix the bucket using bucket check --fix
3. try this procedure
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-August/020012.html
4. Another options is to remove the deleted objects entries from the
bucket index and try to delete it


As a second bucket has failed, I am now in the process of evacuating all 
buckets to another store, then I feel more confident of trying this.


Thanks,

Martin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Production 12.2.2 CephFS Cluster still broken, new Details

2017-12-12 Thread Tobias Prousa

Hi Zheng,

the more you tell me the more what I see begins to makes sens to me. Thank you 
very much.

Could you please be a little more verbose about how to use rados rmomapky? What to use for  and what to use for <>. Here is what my 
dir_frag looks like:


    {
    "damage_type": "dir_frag",
    "id": 1418581248,
    "ino": 1099733590290,
    "frag": "*",
    "path": 
"/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing-backup"
    }

I cannot simply remove that dir through filesystem as it refuses to delete that 
folder.

Then you say its easy to fix backtrace, yet here it looks like some backtraces get fixed with online MDS scrub while most of them fail to be 
fixed and stay in damage_type "backtrace".


Once again, thank you so much for your help!

Best regards,
Tobi



On 12/12/2017 01:10 PM, Yan, Zheng wrote:

On Tue, Dec 12, 2017 at 4:22 PM, Tobias Prousa  wrote:

Hi there,

regarding my ML post from yesterday (Upgrade from 12.2.1 to 12.2.2 broke my
CephFs) I was able to get a little further with the suggested
"cephfs-table-tool take_inos ". This made the whole issue with
loads of "falsely free-marked inodes" go away.

I then restarted MDS, kept all clients down so no client has mounted FS.
Then I started an online MDS scrub

ceph daemon mds.a scrub_path / recursive repair

This again ran for about 3 hours, then MDS again marked FS damaged and
changes its own state to standby (at least that is what I interpret from
what I see. This happened exactly at the moment when the scrub hit a missing
object. See end of logfile (default log level):

2017-12-11 22:29:05.725484 7fc2342bc700  0 log_channel(cluster) log [WRN] :
bad backtrace on inode
0x1000d3aede3(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore),
rewriting it
2017-12-11 22:29:05.725507 7fc2342bc700  0 log_channel(cluster) log [WRN] :
Scrub error on inode 0x1000d3aede3
(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore)
see mds.b log and `damage ls` output for details
2017-12-11 22:29:05.725569 7fc2342bc700 -1 mds.0.scrubstack
_validate_inode_done scrub error on inode [inode 0x1000d3aede3 [2,head]
/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore
auth v382 dirtyparent s=232 n(v0 b232 1=1+0) (iversion lock) | dirtyparent=1
scrubqueue=0 0x55ef37c83200]:
{"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(0)0x1000d3aede3:[<0x1000d3aeda7/test-unwanted-simple.sbstore
v382>,<0x10002de79e8/safebrowsing v7142119>,<0x10002de79df/dsjf5siv.default
v4089757>,<0x10002de79de/firefox v3998050>,<0x10002de79dd/mozilla
v4933047>,<0x100018bd837/.cache v115551644>,<0x100/some_username
v444724510>,<0x1/home v228039388>]//","error_str":"failed to read off disk;
see
retval"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-61}
2017-12-11 22:29:05.729992 7fc2342bc700  0 log_channel(cluster) log [WRN] :
bad backtrace on inode
0x1000d3aedf1(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore),
rewriting it
2017-12-11 22:29:05.730022 7fc2342bc700  0 log_channel(cluster) log [WRN] :
Scrub error on inode 0x1000d3aedf1
(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore)
see mds.b log and `damage ls` output for details
2017-12-11 22:29:05.730077 7fc2342bc700 -1 mds.0.scrubstack
_validate_inode_done scrub error on inode [inode 0x1000d3aedf1 [2,head]
/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore
auth v384 dirtyparent s=232 n(v0 b232 1=1+0) (iversion lock) | dirtyparent=1
scrubqueue=0 0x55ef3aa38a00]:
{"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(0)0x1000d3aedf1:[<0x1000d3aeda7/testexcept-flashsubdoc-simple.sbstore
v384>,<0x10002de79e8/safebrowsing v7142119>,<0x10002de79df/dsjf5siv.default
v4089757>,<0x10002de79de/firefox v3998050>,<0x10002de79dd/mozilla
v4933047>,<0x100018bd837/.cache v115551644>,<0x100/some_username
v444724510>,<0x1/home v228039388>]//","error_str":"failed to read off disk;
see
retval"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-61}
2017-12-11 22:29:05.733389 7fc2342bc700  0 log_channel(cluster) log [WRN] :
bad backtrace on inode
0x1000d3aedb6(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-malware-simp

Re: [ceph-users] Production 12.2.2 CephFS Cluster still broken, new Details

2017-12-12 Thread Yan, Zheng
On Tue, Dec 12, 2017 at 4:22 PM, Tobias Prousa  wrote:
> Hi there,
>
> regarding my ML post from yesterday (Upgrade from 12.2.1 to 12.2.2 broke my
> CephFs) I was able to get a little further with the suggested
> "cephfs-table-tool take_inos ". This made the whole issue with
> loads of "falsely free-marked inodes" go away.
>
> I then restarted MDS, kept all clients down so no client has mounted FS.
> Then I started an online MDS scrub
>
> ceph daemon mds.a scrub_path / recursive repair
>
> This again ran for about 3 hours, then MDS again marked FS damaged and
> changes its own state to standby (at least that is what I interpret from
> what I see. This happened exactly at the moment when the scrub hit a missing
> object. See end of logfile (default log level):
>
> 2017-12-11 22:29:05.725484 7fc2342bc700  0 log_channel(cluster) log [WRN] :
> bad backtrace on inode
> 0x1000d3aede3(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore),
> rewriting it
> 2017-12-11 22:29:05.725507 7fc2342bc700  0 log_channel(cluster) log [WRN] :
> Scrub error on inode 0x1000d3aede3
> (/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore)
> see mds.b log and `damage ls` output for details
> 2017-12-11 22:29:05.725569 7fc2342bc700 -1 mds.0.scrubstack
> _validate_inode_done scrub error on inode [inode 0x1000d3aede3 [2,head]
> /home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore
> auth v382 dirtyparent s=232 n(v0 b232 1=1+0) (iversion lock) | dirtyparent=1
> scrubqueue=0 0x55ef37c83200]:
> {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(0)0x1000d3aede3:[<0x1000d3aeda7/test-unwanted-simple.sbstore
> v382>,<0x10002de79e8/safebrowsing v7142119>,<0x10002de79df/dsjf5siv.default
> v4089757>,<0x10002de79de/firefox v3998050>,<0x10002de79dd/mozilla
> v4933047>,<0x100018bd837/.cache v115551644>,<0x100/some_username
> v444724510>,<0x1/home v228039388>]//","error_str":"failed to read off disk;
> see
> retval"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-61}
> 2017-12-11 22:29:05.729992 7fc2342bc700  0 log_channel(cluster) log [WRN] :
> bad backtrace on inode
> 0x1000d3aedf1(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore),
> rewriting it
> 2017-12-11 22:29:05.730022 7fc2342bc700  0 log_channel(cluster) log [WRN] :
> Scrub error on inode 0x1000d3aedf1
> (/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore)
> see mds.b log and `damage ls` output for details
> 2017-12-11 22:29:05.730077 7fc2342bc700 -1 mds.0.scrubstack
> _validate_inode_done scrub error on inode [inode 0x1000d3aedf1 [2,head]
> /home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore
> auth v384 dirtyparent s=232 n(v0 b232 1=1+0) (iversion lock) | dirtyparent=1
> scrubqueue=0 0x55ef3aa38a00]:
> {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(0)0x1000d3aedf1:[<0x1000d3aeda7/testexcept-flashsubdoc-simple.sbstore
> v384>,<0x10002de79e8/safebrowsing v7142119>,<0x10002de79df/dsjf5siv.default
> v4089757>,<0x10002de79de/firefox v3998050>,<0x10002de79dd/mozilla
> v4933047>,<0x100018bd837/.cache v115551644>,<0x100/some_username
> v444724510>,<0x1/home v228039388>]//","error_str":"failed to read off disk;
> see
> retval"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-61}
> 2017-12-11 22:29:05.733389 7fc2342bc700  0 log_channel(cluster) log [WRN] :
> bad backtrace on inode
> 0x1000d3aedb6(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-malware-simple.cache),
> rewriting it
> 2017-12-11 22:29:05.733420 7fc2342bc700  0 log_channel(cluster) log [WRN] :
> Scrub error on inode 0x1000d3aedb6
> (/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-malware-simple.cache)
> see mds.b log and `damage ls` output for details
> 2017-12-11 22:29:05.733475 7fc2342bc700 -1 mds.0.scrubstack
> _validate_inode_done scrub error on inode [inode 0x1000d3aedb6 [2,head]
> /home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-malware-simple.cache
> auth v366 dirtyparent s=44 n(v0 b44 1=1+0) (iversion lock) | dirtyparent=1
> scrubqueue=0 0x55ef37c78a00]:
> {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-6

Re: [ceph-users] Sudden omap growth on some OSDs

2017-12-12 Thread george.vasilakakos

On 11 Dec 2017, at 18:24, Gregory Farnum 
mailto:gfar...@redhat.com>> wrote:

Hmm, this does all sound odd. Have you tried just restarting the primary OSD 
yet? That frequently resolves transient oddities like this.
If not, I'll go poke at the kraken source and one of the developers more 
familiar with the recovery processes we're seeing here.
-Greg


Hi Greg,

I’ve tried this, no effect. Also, on Friday, we tried removing an OSD (not the 
primary), the OSD that was chosen to replace it had it’s LevelDB grow to 7GiB 
by now. Yesterday it was 5.3.
We’re not seeing any errors logged by the OSDs with the default logging level 
either.

Do you have any comments on the fact that the primary sees the PG’s state as 
being different to what the peers think?
Now, with a new primary I’m seeing the last peer in the set reporting it’s 
‘active+clean’, as is the primary, all other are saying it’s 
‘active+clean+degraded’ (according to PG query output).

This problem is quite weird I think. I copied a LevelDB and dumped a key list; 
the largest in GiB had 66% the number of keys that the average LevelDB has. The 
main difference with the ones that have been around for a while is that they 
have a lot more files that were last touched on the days when the problem 
started but most other LevelDBs have compacted those away and only have about 7 
days old files (as opposed to 3 week old ones that the big ones keep around). 
The big ones do seem to do compactions, they just don’t seem to get rid of that 
stuff.



Thanks,

George


On Fri, Dec 8, 2017 at 7:30 AM 
mailto:george.vasilaka...@stfc.ac.uk>> wrote:


From: Gregory Farnum [gfar...@redhat.com]
Sent: 07 December 2017 21:57
To: Vasilakakos, George (STFC,RAL,SC)
Cc: drakonst...@gmail.com; 
ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Sudden omap growth on some OSDs



On Thu, Dec 7, 2017 at 4:41 AM 
mailto:george.vasilaka...@stfc.ac.uk>>>
 wrote:


From: Gregory Farnum 
[gfar...@redhat.com>]
Sent: 06 December 2017 22:50
To: David Turner
Cc: Vasilakakos, George (STFC,RAL,SC); 
ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] Sudden omap growth on some OSDs

On Wed, Dec 6, 2017 at 2:35 PM David Turner 
mailto:drakonst...@gmail.com>>>

Re: [ceph-users] Production 12.2.2 CephFS Cluster still broken, new Details

2017-12-12 Thread Yan, Zheng
On Tue, Dec 12, 2017 at 4:22 PM, Tobias Prousa  wrote:
> Hi there,
>
> regarding my ML post from yesterday (Upgrade from 12.2.1 to 12.2.2 broke my
> CephFs) I was able to get a little further with the suggested
> "cephfs-table-tool take_inos ". This made the whole issue with
> loads of "falsely free-marked inodes" go away.
>
> I then restarted MDS, kept all clients down so no client has mounted FS.
> Then I started an online MDS scrub
>
> ceph daemon mds.a scrub_path / recursive repair
>
> This again ran for about 3 hours, then MDS again marked FS damaged and
> changes its own state to standby (at least that is what I interpret from
> what I see. This happened exactly at the moment when the scrub hit a missing
> object. See end of logfile (default log level):
>
> 2017-12-11 22:29:05.725484 7fc2342bc700  0 log_channel(cluster) log [WRN] :
> bad backtrace on inode
> 0x1000d3aede3(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore),
> rewriting it
> 2017-12-11 22:29:05.725507 7fc2342bc700  0 log_channel(cluster) log [WRN] :
> Scrub error on inode 0x1000d3aede3
> (/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore)
> see mds.b log and `damage ls` output for details
> 2017-12-11 22:29:05.725569 7fc2342bc700 -1 mds.0.scrubstack
> _validate_inode_done scrub error on inode [inode 0x1000d3aede3 [2,head]
> /home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore
> auth v382 dirtyparent s=232 n(v0 b232 1=1+0) (iversion lock) | dirtyparent=1
> scrubqueue=0 0x55ef37c83200]:
> {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(0)0x1000d3aede3:[<0x1000d3aeda7/test-unwanted-simple.sbstore
> v382>,<0x10002de79e8/safebrowsing v7142119>,<0x10002de79df/dsjf5siv.default
> v4089757>,<0x10002de79de/firefox v3998050>,<0x10002de79dd/mozilla
> v4933047>,<0x100018bd837/.cache v115551644>,<0x100/some_username
> v444724510>,<0x1/home v228039388>]//","error_str":"failed to read off disk;
> see
> retval"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-61}
> 2017-12-11 22:29:05.729992 7fc2342bc700  0 log_channel(cluster) log [WRN] :
> bad backtrace on inode
> 0x1000d3aedf1(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore),
> rewriting it
> 2017-12-11 22:29:05.730022 7fc2342bc700  0 log_channel(cluster) log [WRN] :
> Scrub error on inode 0x1000d3aedf1
> (/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore)
> see mds.b log and `damage ls` output for details
> 2017-12-11 22:29:05.730077 7fc2342bc700 -1 mds.0.scrubstack
> _validate_inode_done scrub error on inode [inode 0x1000d3aedf1 [2,head]
> /home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore
> auth v384 dirtyparent s=232 n(v0 b232 1=1+0) (iversion lock) | dirtyparent=1
> scrubqueue=0 0x55ef3aa38a00]:
> {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(0)0x1000d3aedf1:[<0x1000d3aeda7/testexcept-flashsubdoc-simple.sbstore
> v384>,<0x10002de79e8/safebrowsing v7142119>,<0x10002de79df/dsjf5siv.default
> v4089757>,<0x10002de79de/firefox v3998050>,<0x10002de79dd/mozilla
> v4933047>,<0x100018bd837/.cache v115551644>,<0x100/some_username
> v444724510>,<0x1/home v228039388>]//","error_str":"failed to read off disk;
> see
> retval"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-61}
> 2017-12-11 22:29:05.733389 7fc2342bc700  0 log_channel(cluster) log [WRN] :
> bad backtrace on inode
> 0x1000d3aedb6(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-malware-simple.cache),
> rewriting it
> 2017-12-11 22:29:05.733420 7fc2342bc700  0 log_channel(cluster) log [WRN] :
> Scrub error on inode 0x1000d3aedb6
> (/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-malware-simple.cache)
> see mds.b log and `damage ls` output for details
> 2017-12-11 22:29:05.733475 7fc2342bc700 -1 mds.0.scrubstack
> _validate_inode_done scrub error on inode [inode 0x1000d3aedb6 [2,head]
> /home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-malware-simple.cache
> auth v366 dirtyparent s=44 n(v0 b44 1=1+0) (iversion lock) | dirtyparent=1
> scrubqueue=0 0x55ef37c78a00]:
> {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-6

[ceph-users] ceph configuration backup - what is vital?

2017-12-12 Thread Wolfgang Lendl
hello,

I'm looking for a recommendation about what parts/configuration/etc to
backup from a ceph cluster in case of a disaster.
I know this depends heavily on the type of disaster and I'm not talking
about backup of payload stored on osds.

currently I have my admin key stored somewhere outside the cluster -
maybe there are some best practices out there?


wolfgang

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-volume lvm activate could not find osd..0

2017-12-12 Thread Dan van der Ster
Doh!
The activate command needs the *osd* fsid, not the cluster fsid.
So this works:

   ceph-volume lvm activate 0 6608c0cf-3827-4967-94fd-5a3336f604c3

Is an "activate-all" equivalent planned?

-- Dan


On Tue, Dec 12, 2017 at 11:35 AM, Dan van der Ster  wrote:
> Hi all,
>
> Did anyone successfully prepare a new OSD with ceph-volume in 12.2.2?
>
> We are trying the simplest thing possible and not succeeding :(
>
> # ceph-volume lvm prepare --bluestore --data /dev/sdb
> # ceph-volume lvm list
>
> == osd.0 ===
>
>   [block]
> /dev/ceph-4da6fd06-b069-49af-901f-c9513baabdbd/osd-block-6608c0cf-3827-4967-94fd-5a3336f604c3
>
>   type  block
>   osd id0
>   cluster fsid  4da6fd06-b069-49af-901f-c9513baabdbd
>   cluster name  ceph
>   osd fsid  6608c0cf-3827-4967-94fd-5a3336f604c3
>   block uuid6HRKMQ-5O7f-SXMl-pwMr-xTwL-J103-uWCcWh
>   block device
> /dev/ceph-4da6fd06-b069-49af-901f-c9513baabdbd/osd-block-6608c0cf-3827-4967-94fd-5a3336f604c3
>
> # ceph-volume lvm activate 0 4da6fd06-b069-49af-901f-c9513baabdbd
> -->  RuntimeError: could not find osd.0 with fsid
> 4da6fd06-b069-49af-901f-c9513baabdbd
>
> Cheers, Dan
>
> P.S: ceph-disk --bluestore used to create separate partitions for
> block/db/wal, but ceph-volume seems to create one big data pv. Is that
> the recommended way forward?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Slow objects deletion

2017-12-12 Thread Rafał Wądołowski

Hi,

Is there any known fast procedure to delete objects in large buckets? I 
have about 40 milions of objects. I used:


radosgw-admin bucket rm --bucket=bucket-3 --purge-objects

but it is very slow. I am using ceph luminous (12.2.1).

Is it working in parallel?

--
BR,

Rafał Wądołowski

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-volume lvm activate could not find osd..0

2017-12-12 Thread Dan van der Ster
Hi all,

Did anyone successfully prepare a new OSD with ceph-volume in 12.2.2?

We are trying the simplest thing possible and not succeeding :(

# ceph-volume lvm prepare --bluestore --data /dev/sdb
# ceph-volume lvm list

== osd.0 ===

  [block]
/dev/ceph-4da6fd06-b069-49af-901f-c9513baabdbd/osd-block-6608c0cf-3827-4967-94fd-5a3336f604c3

  type  block
  osd id0
  cluster fsid  4da6fd06-b069-49af-901f-c9513baabdbd
  cluster name  ceph
  osd fsid  6608c0cf-3827-4967-94fd-5a3336f604c3
  block uuid6HRKMQ-5O7f-SXMl-pwMr-xTwL-J103-uWCcWh
  block device
/dev/ceph-4da6fd06-b069-49af-901f-c9513baabdbd/osd-block-6608c0cf-3827-4967-94fd-5a3336f604c3

# ceph-volume lvm activate 0 4da6fd06-b069-49af-901f-c9513baabdbd
-->  RuntimeError: could not find osd.0 with fsid
4da6fd06-b069-49af-901f-c9513baabdbd

Cheers, Dan

P.S: ceph-disk --bluestore used to create separate partitions for
block/db/wal, but ceph-volume seems to create one big data pv. Is that
the recommended way forward?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Resharding issues / How long does it take?

2017-12-12 Thread Orit Wasserman
Hi,

On Mon, Dec 11, 2017 at 11:45 AM, Martin Emrich
 wrote:
> Hi!
>
> Am 10.12.17, 11:54 schrieb "Orit Wasserman" :
>
> Hi Martin,
>
> On Thu, Dec 7, 2017 at 5:05 PM, Martin Emrich  
> wrote:
>
> It could be issue: http://tracker.ceph.com/issues/21619
> The workaround is running radosgw-admin bucket check --fix , it will
> reset the resharding flag.
> If you can update the tracker with your logs it will be very helpful.
>
> I already tried that two times, and after working hard for a few minutes it 
> hangs. Each time it seems to have created a new "set" of objects in the pool 
> (the bucket had ca. 11 objects, "radosgw-admin bucket limit check" 
> reports ca. 33 "num_objects" now).
>

This is after resharding the bucket?

> Which logs would be helpful?
>

rgw logs , if you can increase the debug level debug_rgw=20 and
debug_ms=1 that will be great.

> > I have a feeling that the bucket index is still 
> damaged/incomplete/inconsistent. What does the message
> >
> > *** NOTICE: operation will not remove old bucket index objects ***
> > *** these will need to be removed manually ***
> >
> > mean? How can I clean up manually?
> >
>
> Resharding creates a new bucket index with the new number of shards.
> It doesn't remove the old bucket index, you will need to do it manually.
>
> How do I do that? Does it just involve identifying the right objects in the 
> RADOS pools to delete? Or is there more to it?
>

When resharding completes it prints the old bucket instance id, you
will need to remove it.
I think this is the warning in the start of resharding I believe in
your case resharding hasn't completed.
you can get the old bucket instance id from the resharding log or the
bucket info.
Than you will need to delete it using rados command.

> My primary goal is now to completely remove the damaged bucket without a 
> trace, but I'd also love to find out what went wrong in the first place.
> Could I have a "multisite setup" without knowing it? I did not knowingly set 
> up anything in this regard, It's just one cluster, with three identically 
> configured radosgw behind a load balancer...
>
Probably not you need to setup multisite ...

As to remove the damaged bucket, if the bucket index is no consistent
you will need to manually remove it:
first unlink the bucket from the user: radosgw-admin bucket unlink

Than you will need manually remove the bucket:
1. if this is you only bucket I would go for deleting the bucket pools
and using new one
2. Try to fix the bucket using bucket check --fix
3. try this procedure
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-August/020012.html
4. Another options is to remove the deleted objects entries from the
bucket index and try to delete it

Good Luck,
Orit

> Thanks,
>
> Martin
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Production 12.2.2 CephFS Cluster still broken, new Details

2017-12-12 Thread Tobias Prousa

Hi there,

regarding my ML post from yesterday (Upgrade from 12.2.1 to 12.2.2 broke 
my CephFs) I was able to get a little further with the suggested 
"cephfs-table-tool take_inos ". This made the whole issue with 
loads of "falsely free-marked inodes" go away.


I then restarted MDS, kept all clients down so no client has mounted FS. 
Then I started an online MDS scrub


ceph daemon mds.a scrub_path / recursive repair

This again ran for about 3 hours, then MDS again marked FS damaged and 
changes its own state to standby (at least that is what I interpret from 
what I see. This happened exactly at the moment when the scrub hit a 
missing object. See end of logfile (default log level):


2017-12-11 22:29:05.725484 7fc2342bc700  0 log_channel(cluster) log 
[WRN] : bad backtrace on inode 
0x1000d3aede3(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore), 
rewriting it
2017-12-11 22:29:05.725507 7fc2342bc700  0 log_channel(cluster) log 
[WRN] : Scrub error on inode 0x1000d3aede3 
(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore) 
see mds.b log and `damage ls` output for details
2017-12-11 22:29:05.725569 7fc2342bc700 -1 mds.0.scrubstack 
_validate_inode_done scrub error on inode [inode 0x1000d3aede3 [2,head] 
/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore 
auth v382 dirtyparent s=232 n(v0 b232 1=1+0) (iversion lock) | 
dirtyparent=1 scrubqueue=0 0x55ef37c83200]: 
{"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(0)0x1000d3aede3:[<0x1000d3aeda7/test-unwanted-simple.sbstore 
v382>,<0x10002de79e8/safebrowsing 
v7142119>,<0x10002de79df/dsjf5siv.default 
v4089757>,<0x10002de79de/firefox v3998050>,<0x10002de79dd/mozilla 
v4933047>,<0x100018bd837/.cache v115551644>,<0x100/some_username 
v444724510>,<0x1/home v228039388>]//","error_str":"failed to read off 
disk; see 
retval"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-61}
2017-12-11 22:29:05.729992 7fc2342bc700  0 log_channel(cluster) log 
[WRN] : bad backtrace on inode 
0x1000d3aedf1(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore), 
rewriting it
2017-12-11 22:29:05.730022 7fc2342bc700  0 log_channel(cluster) log 
[WRN] : Scrub error on inode 0x1000d3aedf1 
(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore) 
see mds.b log and `damage ls` output for details
2017-12-11 22:29:05.730077 7fc2342bc700 -1 mds.0.scrubstack 
_validate_inode_done scrub error on inode [inode 0x1000d3aedf1 [2,head] 
/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore 
auth v384 dirtyparent s=232 n(v0 b232 1=1+0) (iversion lock) | 
dirtyparent=1 scrubqueue=0 0x55ef3aa38a00]: 
{"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(0)0x1000d3aedf1:[<0x1000d3aeda7/testexcept-flashsubdoc-simple.sbstore 
v384>,<0x10002de79e8/safebrowsing 
v7142119>,<0x10002de79df/dsjf5siv.default 
v4089757>,<0x10002de79de/firefox v3998050>,<0x10002de79dd/mozilla 
v4933047>,<0x100018bd837/.cache v115551644>,<0x100/some_username 
v444724510>,<0x1/home v228039388>]//","error_str":"failed to read off 
disk; see 
retval"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-61}
2017-12-11 22:29:05.733389 7fc2342bc700  0 log_channel(cluster) log 
[WRN] : bad backtrace on inode 
0x1000d3aedb6(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-malware-simple.cache), 
rewriting it
2017-12-11 22:29:05.733420 7fc2342bc700  0 log_channel(cluster) log 
[WRN] : Scrub error on inode 0x1000d3aedb6 
(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-malware-simple.cache) 
see mds.b log and `damage ls` output for details
2017-12-11 22:29:05.733475 7fc2342bc700 -1 mds.0.scrubstack 
_validate_inode_done scrub error on inode [inode 0x1000d3aedb6 [2,head] 
/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-malware-simple.cache 
auth v366 dirtyparent s=44 n(v0 b44 1=1+0) (iversion lock) | 
dirtyparent=1 scrubqueue=0 0x55ef37c78a00]: 
{"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(0)0x1000d3aedb6:[<0x1000d3aeda7/test-malware-simple.cache 
v366>,<0x10002de79e8/safebr

Re: [ceph-users] Luminous, RGW bucket resharding

2017-12-12 Thread Orit Wasserman
On Mon, Dec 11, 2017 at 5:44 PM, Sam Wouters  wrote:
> On 11-12-17 16:23, Orit Wasserman wrote:
>> On Mon, Dec 11, 2017 at 4:58 PM, Sam Wouters  wrote:
>>> Hi Orrit,
>>>
>>>
>>> On 04-12-17 18:57, Orit Wasserman wrote:
 Hi Andreas,

 On Mon, Dec 4, 2017 at 11:26 AM, Andreas Calminder
  wrote:
> Hello,
> With release 12.2.2 dynamic resharding bucket index has been disabled
> when running a multisite environment
> (http://tracker.ceph.com/issues/21725). Does this mean that resharding
> of bucket indexes shouldn't be done at all, manually, while running
> multisite as there's a risk of corruption?
>
 You will need to stop the sync on the bucket before doing the
 resharding and start it again after the resharding completes.
 It will start a full sync on the bucket (it doesn't mean we copy the
 objects but we go over on all of them to check if the need to be
 synced).
 We will automate this as part of the reshard admin command in the next
 Luminous release.
>>> Does this also apply to Jewel? Stop sync and restart after resharding.
>>> (I don't know if there is any way to disable sync for a specific bucket.)
>>>
>> In Jewel we only support offline bucket resharding, you have to stop
>> both zones gateways before resharding.
>> Do:
>> Execute the resharding radosgw-admin command.
>> Run full sync on the bucket using: radosgw-admin bucket sync init on the 
>> bucket.
>> Start the gateways.
>>
>> This should work but I have not tried it ...
>> Regards,
>> Orit
> Is it necessary to really stop the gateways? We tend to block all
> traffic to the bucket being resharded with the use of ACLs in the
> haproxy in front, to avoid downtime for non related buckets.
>

You need to stop the sync process for the bucket and the most simple
way is to stop the gateways.
In Luminous we added the capability to stop the sync per bucket but
you don't have it in Jewel.

> Would a:
>
> - restart gws with sync thread disabled
> - block traffic to bucket
> - reshard
> - unblock traffic
> - bucket sync init
> - restart gws with sync enabled
>
> work as well?
>
> r,
> Sam
>
>>> r,
>>> Sam
> Also, as dynamic bucket resharding was/is the main motivator moving to
> Luminous (for me at least) is dynamic reshardning something that is
> planned to be fixed for multisite environments later in the Luminous
> life-cycle or will it be left disabled forever?
>
 We are planning to enable it in Luminous time.

 Regards,
 Orit

> Thanks!
> /andreas
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com