[ceph-users] kernel:rbd:rbd0: encountered watch error: -10

2018-11-09 Thread xiang . dai
Hi ! 

I meet a confused case: 

When write to cephfs and rbd at same time, after a while, rbd process is hang 
and i find: 

kernel:rbd:rbd0: encountered watch error: -10 

I try to reproduce with below action and succeed: 

- run 2 dd process to write to cephfs 
- do file write action on rbd 

I find that lots of cpu are in iowait status, and lots of kernel process in D 
status. 

I guess that: 

- the process in the D state is mainly kswapd and writeback dirty page 
write-back thread process. 
when IO wait queue of the rbd disk is very long, then any process do IO 
operations on rbd disk, 
they need to be queued and wait for a long time and in the D state, the kernel 
will automatically print out the call stack after more than 120s 

- rbd hang since rbd client use watch-notify to communicate, when iowait stress 
is high, may do impact on it 

- cephfs and rbd share network bandwidth, and we use 40GB IB for ceph, network 
speed is too faster than disk speed 

Only workaround i can think about is refresh page cache by crond, but it may 
result in performance degradation. 

Could someone help me? 

Why rbd hang and how can I fix? 

I really want to use cephfs and rbd at same time, but this issue is so bad for 
production environment. 

Thanks 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] can not start osd service by systemd

2018-11-09 Thread xiang . dai
Hi! 

I find a confused question about start/stop ceph cluster by systemd: 

- when cluster is on, restart ceph.target can restart all osd service 
- when cluster is down, start ceph.target or start ceph-osd.target can not 
start osd service 


I have google this issue, seems the workaround is start ceph-osd@n.service by 
hand. 

Is it a bug? 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow ops after cephfs snapshot removal

2018-11-09 Thread Chris Taylor




> On Nov 9, 2018, at 1:38 PM, Gregory Farnum  wrote:
> 
>> On Fri, Nov 9, 2018 at 2:24 AM Kenneth Waegeman  
>> wrote:
>> Hi all,
>> 
>> On Mimic 13.2.1, we are seeing blocked ops on cephfs after removing some 
>> snapshots:
>> 
>> [root@osd001 ~]# ceph -s
>>cluster:
>>  id: 92bfcf0a-1d39-43b3-b60f-44f01b630e47
>>  health: HEALTH_WARN
>>  5 slow ops, oldest one blocked for 1162 sec, mon.mds03 has 
>> slow ops
>> 
>>services:
>>  mon: 3 daemons, quorum mds01,mds02,mds03
>>  mgr: mds02(active), standbys: mds03, mds01
>>  mds: ceph_fs-2/2/2 up  {0=mds03=up:active,1=mds01=up:active}, 1 
>> up:standby
>>  osd: 544 osds: 544 up, 544 in
>> 
>>io:
>>  client:   5.4 KiB/s wr, 0 op/s rd, 0 op/s wr
>> 
>> [root@osd001 ~]# ceph health detail
>> HEALTH_WARN 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has 
>> slow ops
>> SLOW_OPS 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has slow ops
>> 
>> [root@osd001 ~]# ceph -v
>> ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic 
>> (stable)
>> 
>> Is this a known issue?
> 
> It's not exactly a known issue, but from the output and story you've got here 
> it looks like the OSDs are deleting the snapshot data too fast and the MDS 
> isn't getting quick enough replies? Or maybe you have an overlarge CephFS 
> directory which is taking a long time to clean up somehow; you should get the 
> MDS ops and the MDS' objecter ops in flight and see what specifically is 
> taking so long.
> -Greg

We had a similar issue on ceph 10.2 and RBD images. It was fixed by slowing 
down snapshot removal by adding this to the ceph.conf. 

[osd]
osd snap trim sleep = 0.6



>  
>> 
>> Cheers,
>> 
>> Kenneth
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow ops after cephfs snapshot removal

2018-11-09 Thread Gregory Farnum
On Fri, Nov 9, 2018 at 2:24 AM Kenneth Waegeman 
wrote:

> Hi all,
>
> On Mimic 13.2.1, we are seeing blocked ops on cephfs after removing some
> snapshots:
>
> [root@osd001 ~]# ceph -s
>cluster:
>  id: 92bfcf0a-1d39-43b3-b60f-44f01b630e47
>  health: HEALTH_WARN
>  5 slow ops, oldest one blocked for 1162 sec, mon.mds03 has
> slow ops
>
>services:
>  mon: 3 daemons, quorum mds01,mds02,mds03
>  mgr: mds02(active), standbys: mds03, mds01
>  mds: ceph_fs-2/2/2 up  {0=mds03=up:active,1=mds01=up:active}, 1
> up:standby
>  osd: 544 osds: 544 up, 544 in
>
>io:
>  client:   5.4 KiB/s wr, 0 op/s rd, 0 op/s wr
>
> [root@osd001 ~]# ceph health detail
> HEALTH_WARN 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has
> slow ops
> SLOW_OPS 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has slow
> ops
>
> [root@osd001 ~]# ceph -v
> ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic
> (stable)
>
> Is this a known issue?
>

It's not exactly a known issue, but from the output and story you've got
here it looks like the OSDs are deleting the snapshot data too fast and the
MDS isn't getting quick enough replies? Or maybe you have an overlarge
CephFS directory which is taking a long time to clean up somehow; you
should get the MDS ops and the MDS' objecter ops in flight and see what
specifically is taking so long.
-Greg


>
> Cheers,
>
> Kenneth
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to repair rstats mismatch

2018-11-09 Thread Gregory Farnum
There's a repair flag you can pass to the scrub_path command which will
cause it to fix those up.

On Thu, Nov 8, 2018 at 8:32 PM Bryan Henderson 
wrote:

> How does one repair an rstats mismatch detected by 'scrub_path' (caused by
> a
> previous failure to write the journal)?
>
> And how bad is an rstats mismatch?  What are rstats used for?  I see one
> thing
> the mismatch does, apparently, is make it impossible to delete the
> directory,
> as Cephfs says it isn't empty, while also giving an empty list of its
> contents.
>
> --
> Bryan Henderson   San Jose, California
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Effects of restoring a cluster's mon from an older backup

2018-11-09 Thread Gregory Farnum
On Thu, Nov 8, 2018 at 3:41 AM Hector Martin  wrote:

> I'm experimenting with single-host Ceph use cases, where HA is not
> important but data durability is.
>
> How does a Ceph cluster react to its (sole) mon being rolled back to an
> earlier state? The idea here is that the mon storage may not be
> redundant but would be (atomically, e.g. lvm snapshot and dump) backed
> up, say, daily. If the cluster goes down and then is brought back up
> with a mon backup that is several days to hours old, while the OSDs are
> up to date, what are the potential consequences?
>
> Of course I expect maintenance operations to be affected (obviously any
> OSDs added/removed would likely get confused). But what about regular
> operation? Things like snapshots and snapshot ranges. Is this likely to
> cause data loss, or would the OSDs and clients largely not be affected
> as long as the cluster config has not changed?
>
> There's a way of rebuilding the monmap from OSD data:
>
>
> http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds
>
> Would this be preferable to just restoring the mon from a backup?


Yes, do that, don't try and back up your monitor. If you restore a monitor
from backup then the monitor — your authoritative data source — will warp
back in time on what the OSD peering intervals look like, which snapshots
have been deleted and created, etc. It would be a huge disaster and
probably every running daemon or client would have to pause IO until the
monitor generated enough map epochs to "catch up" — and then the rest of
the cluster would start applying those changes and nothing would work right.



> What
> about the MDS map?
>

Unlike the OSDMap, the MDSMap doesn't really keep track of any persistent
data so it's much safer to rebuild or reset from scratch.
-Greg


>
> --
> Hector Martin (hec...@marcansoft.com)
> Public Key: https://mrcn.st/pub
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] troubleshooting ceph rdma performance

2018-11-09 Thread Gregory Farnum
On Wed, Nov 7, 2018 at 10:52 PM Raju Rangoju  wrote:

> Hello All,
>
>
>
> I have been collecting performance numbers on our ceph cluster, and I had
> noticed a very poor throughput on ceph async+rdma when compared with tcp. I
> was wondering what tunings/settings should I do to the cluster that would
> improve the *ceph rdma* (async+rdma) performance.
>
>
>
> Currently, from what we see: Ceph rdma throughput is less than half of the
> ceph tcp throughput (ran fio over iscsi mounted disks).
>
> Our ceph cluster has 8 nodes and configured with two networks, cluster and
> client networks.
>
>
>
> Can someone please shed some light.
>

Unfortunately the RDMA implementations are still fairly experimental and
the community doesn't have much experience with them. I think the last I
heard, the people developing that feature were planning to port it over to
a different RDMA library (though that might be wrong/out of date) — it's
not something I would consider a stable implementation. :/
-Greg


>
>
> I’d be glad to provide any further information regarding the setup.
>
>
>
> Thanks in Advance,
>
> Raju
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] read performance, separate client CRUSH maps or limit osd read access from each client

2018-11-09 Thread Martin Verges
Hello Vlad,

you can generate something like this:

rule dc1_primary_dc2_secondary {
id 1
type replicated
min_size 1
max_size 10
step take dc1
step chooseleaf firstn 1 type host
step emit
step take dc2
step chooseleaf firstn 1 type host
step emit
step take dc3
step chooseleaf firstn -2 type host
step emit
}

rule dc2_primary_dc1_secondary {
id 2
type replicated
min_size 1
max_size 10
step take dc1
step chooseleaf firstn 1 type host
step emit
step take dc2
step chooseleaf firstn 1 type host
step emit
step take dc3
step chooseleaf firstn -2 type host
step emit
}

After you added such crush rules, you can configure the pools:

~ $ ceph osd pool set  crush_ruleset 1
~ $ ceph osd pool set  crush_ruleset 2

Now you place your workload from dc1 to the dc1 pool, and workload
from dc2 to the dc2 pool. You could also use HDD with SSD journal (if
your workload issn't that write intensive) and save some money in dc3
as your client would always read from a SSD and write to Hybrid.

Btw. all this could be done with a few simple clicks through our web
frontend. Even if you want to export it via CephFS / NFS / .. it is
possible to set it on a per folder level. Feel free to take a look at
https://www.youtube.com/watch?v=V33f7ipw9d4 to see how easy it could
be.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


2018-11-09 17:35 GMT+01:00 Vlad Kopylov :
> Please disregard pg status, one of test vms was down for some time it is
> healing.
> Question only how to make it read from proper datacenter
>
> If you have an example.
>
> Thanks
>
>
> On Fri, Nov 9, 2018 at 11:28 AM Vlad Kopylov  wrote:
>>
>> Martin, thank you for the tip.
>> googling ceph crush rule examples doesn't give much on rules, just static
>> placement of buckets.
>> this all seems to be for placing data, not to giving client in specific
>> datacenter proper read osd
>>
>> maybe something wrong with placement groups?
>>
>> I added datacenter dc1 dc2 dc3
>> Current replicated_rule is
>>
>> rule replicated_rule {
>> id 0
>> type replicated
>> min_size 1
>> max_size 10
>> step take default
>> step chooseleaf firstn 0 type host
>> step emit
>> }
>>
>> # buckets
>> host ceph1 {
>> id -3 # do not change unnecessarily
>> id -2 class ssd # do not change unnecessarily
>> # weight 1.000
>> alg straw2
>> hash 0 # rjenkins1
>> item osd.0 weight 1.000
>> }
>> datacenter dc1 {
>> id -9 # do not change unnecessarily
>> id -4 class ssd # do not change unnecessarily
>> # weight 1.000
>> alg straw2
>> hash 0 # rjenkins1
>> item ceph1 weight 1.000
>> }
>> host ceph2 {
>> id -5 # do not change unnecessarily
>> id -6 class ssd # do not change unnecessarily
>> # weight 1.000
>> alg straw2
>> hash 0 # rjenkins1
>> item osd.1 weight 1.000
>> }
>> datacenter dc2 {
>> id -10 # do not change unnecessarily
>> id -8 class ssd # do not change unnecessarily
>> # weight 1.000
>> alg straw2
>> hash 0 # rjenkins1
>> item ceph2 weight 1.000
>> }
>> host ceph3 {
>> id -7 # do not change unnecessarily
>> id -12 class ssd # do not change unnecessarily
>> # weight 1.000
>> alg straw2
>> hash 0 # rjenkins1
>> item osd.2 weight 1.000
>> }
>> datacenter dc3 {
>> id -11 # do not change unnecessarily
>> id -13 class ssd # do not change unnecessarily
>> # weight 1.000
>> alg straw2
>> hash 0 # rjenkins1
>> item ceph3 weight 1.000
>> }
>> root default {
>> id -1 # do not change unnecessarily
>> id -14 class ssd # do not change unnecessarily
>> # weight 3.000
>> alg straw2
>> hash 0 # rjenkins1
>> item dc1 weight 1.000
>> item dc2 weight 1.000
>> item dc3 weight 1.000
>> }
>>
>>
>> #ceph pg dump
>> dumped all
>> version 29433
>> stamp 2018-11-09 11:23:44.510872
>> last_osdmap_epoch 0
>> last_pg_scan 0
>> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTESLOG
>> DISK_LOG STATE  STATE_STAMPVERSION
>> REPORTED UP  UP_PRIMARY ACTING  ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP
>> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP   SNAPTRIMQ_LEN
>> 1.5f  0  00 0   00
>> 00   active+clean 2018-11-09 04:35:32.320607  0'0
>> 544:1317 [0,2,1]  0 [0,2,1]  00'0 2018-11-09
>> 04:35:32.320561 0'0 2018-11-04 11:55:54.756115 0
>> 2.5c143  0  143 0   0 19490267
>> 461  461 active+undersized+degraded 2018-11-08 19:02:03.873218  508'461
>> 544:2100   [2,1]  2   [2,1]  2290'380 2018-

Re: [ceph-users] read performance, separate client CRUSH maps or limit osd read access from each client

2018-11-09 Thread Vlad Kopylov
Please disregard pg status, one of test vms was down for some time it is
healing.
Question only how to make it read from proper datacenter

If you have an example.

Thanks


On Fri, Nov 9, 2018 at 11:28 AM Vlad Kopylov  wrote:

> Martin, thank you for the tip.
> googling ceph crush rule examples doesn't give much on rules, just static
> placement of buckets.
> this all seems to be for placing data, not to giving client in specific
> datacenter proper read osd
>
> maybe something wrong with placement groups?
>
> I added datacenter dc1 dc2 dc3
> Current replicated_rule is
>
> rule replicated_rule {
> id 0
>   type replicated
> min_size 1
> max_size 10
> step take default
> step chooseleaf firstn 0 type host
> step emit
> }
>
> # buckets
> host ceph1 {
>   id -3   # do not change unnecessarily
>   id -2 class ssd # do not change unnecessarily
>   # weight 1.000
>   alg straw2
>   hash 0  # rjenkins1
>   item osd.0 weight 1.000
> }
> datacenter dc1 {
>   id -9   # do not change unnecessarily
>   id -4 class ssd # do not change unnecessarily
>   # weight 1.000
>   alg straw2
>   hash 0  # rjenkins1
>   item ceph1 weight 1.000
> }
> host ceph2 {
>   id -5   # do not change unnecessarily
>   id -6 class ssd # do not change unnecessarily
>   # weight 1.000
>   alg straw2
>   hash 0  # rjenkins1
>   item osd.1 weight 1.000
> }
> datacenter dc2 {
>   id -10  # do not change unnecessarily
>   id -8 class ssd # do not change unnecessarily
>   # weight 1.000
>   alg straw2
>   hash 0  # rjenkins1
>   item ceph2 weight 1.000
> }
> host ceph3 {
>   id -7   # do not change unnecessarily
>   id -12 class ssd# do not change unnecessarily
>   # weight 1.000
>   alg straw2
>   hash 0  # rjenkins1
>   item osd.2 weight 1.000
> }
> datacenter dc3 {
>   id -11  # do not change unnecessarily
>   id -13 class ssd# do not change unnecessarily
>   # weight 1.000
>   alg straw2
>   hash 0  # rjenkins1
>   item ceph3 weight 1.000
> }
> root default {
>   id -1   # do not change unnecessarily
>   id -14 class ssd# do not change unnecessarily
>   # weight 3.000
>   alg straw2
>   hash 0  # rjenkins1
>   item dc1 weight 1.000
>   item dc2 weight 1.000
>   item dc3 weight 1.000
> }
>
>
> #ceph pg dump
> dumped all
> version 29433
> stamp 2018-11-09 11:23:44.510872
> last_osdmap_epoch 0
> last_pg_scan 0
> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTESLOG  
> DISK_LOG STATE  STATE_STAMPVERSION  
> REPORTED UP  UP_PRIMARY ACTING  ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP 
>LAST_DEEP_SCRUB DEEP_SCRUB_STAMP   SNAPTRIMQ_LEN
> 1.5f  0  00 0   000   
>  0   active+clean 2018-11-09 04:35:32.320607  0'0 
> 544:1317 [0,2,1]  0 [0,2,1]  00'0 2018-11-09 
> 04:35:32.320561 0'0 2018-11-04 11:55:54.756115 0
> 2.5c143  0  143 0   0 19490267  461   
>461 active+undersized+degraded 2018-11-08 19:02:03.873218  508'461 
> 544:2100   [2,1]  2   [2,1]  2290'380 2018-11-07 
> 18:58:43.043719  64'120 2018-11-05 14:21:49.256324 0
> .
> sum 15239 0 2053 2659 0 2157615019 58286 58286
> OSD_STAT USEDAVAIL  TOTAL  HB_PEERS PG_SUM PRIMARY_PG_SUM
> 23.7 GiB 28 GiB 32 GiB[0,1]200 73
> 13.7 GiB 28 GiB 32 GiB[0,2]200 58
> 03.7 GiB 28 GiB 32 GiB[1,2]173 69
> sum   11 GiB 85 GiB 96 GiB
>
> #ceph pg map 2.5c
> osdmap e545 pg 2.5c (2.5c) -> up [2,1] acting [2,1]
>
> #pg map 1.5f
> osdmap e547 pg 1.5f (1.5f) -> up [0,2,1] acting [0,2,1]
>
>
> On Fri, Nov 9, 2018 at 2:21 AM Martin Verges 
> wrote:
>
>> Hello Vlad,
>>
>> Ceph clients connect to the primary OSD of each PG. If you create a
>> crush rule for building1 and one for building2 that takes a OSD from
>> the same building as the first one, your reads to the pool will always
>> be on the same building (if the cluster is healthy) and only write
>> request get replicated to the other building.
>>
>> --
>> Martin Verges
>> Managing director
>>
>> Mobile: +49 174 9335695
>> E-Mail: martin.ver...@croit.io
>> Chat: https://t.me/MartinVerges
>>
>> croit GmbH, Freseniusstr. 31h, 81247 Munich
>> CEO: Martin Verges - VAT-ID: DE310638492
>> Com. register: Amtsgericht Munich HRB 231263
>>
>> Web: https://croit.io
>> YouTube: https://goo.gl/PGE1Bx
>>
>>
>> 2018-11-09 4:54 GMT+01:00 Vlad Kopylov :
>> > I am trying to test replicated ceph with servers in different
>> buildings, and
>> > I hav

Re: [ceph-users] read performance, separate client CRUSH maps or limit osd read access from each client

2018-11-09 Thread Vlad Kopylov
Martin, thank you for the tip.
googling ceph crush rule examples doesn't give much on rules, just static
placement of buckets.
this all seems to be for placing data, not to giving client in specific
datacenter proper read osd

maybe something wrong with placement groups?

I added datacenter dc1 dc2 dc3
Current replicated_rule is

rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

# buckets
host ceph1 {
id -3   # do not change unnecessarily
id -2 class ssd # do not change unnecessarily
# weight 1.000
alg straw2
hash 0  # rjenkins1
item osd.0 weight 1.000
}
datacenter dc1 {
id -9   # do not change unnecessarily
id -4 class ssd # do not change unnecessarily
# weight 1.000
alg straw2
hash 0  # rjenkins1
item ceph1 weight 1.000
}
host ceph2 {
id -5   # do not change unnecessarily
id -6 class ssd # do not change unnecessarily
# weight 1.000
alg straw2
hash 0  # rjenkins1
item osd.1 weight 1.000
}
datacenter dc2 {
id -10  # do not change unnecessarily
id -8 class ssd # do not change unnecessarily
# weight 1.000
alg straw2
hash 0  # rjenkins1
item ceph2 weight 1.000
}
host ceph3 {
id -7   # do not change unnecessarily
id -12 class ssd# do not change unnecessarily
# weight 1.000
alg straw2
hash 0  # rjenkins1
item osd.2 weight 1.000
}
datacenter dc3 {
id -11  # do not change unnecessarily
id -13 class ssd# do not change unnecessarily
# weight 1.000
alg straw2
hash 0  # rjenkins1
item ceph3 weight 1.000
}
root default {
id -1   # do not change unnecessarily
id -14 class ssd# do not change unnecessarily
# weight 3.000
alg straw2
hash 0  # rjenkins1
item dc1 weight 1.000
item dc2 weight 1.000
item dc3 weight 1.000
}


#ceph pg dump
dumped all
version 29433
stamp 2018-11-09 11:23:44.510872
last_osdmap_epoch 0
last_pg_scan 0
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES
LOG  DISK_LOG STATE  STATE_STAMP
VERSION  REPORTED UP  UP_PRIMARY ACTING  ACTING_PRIMARY LAST_SCRUB
SCRUB_STAMPLAST_DEEP_SCRUB DEEP_SCRUB_STAMP
SNAPTRIMQ_LEN
1.5f  0  00 0   00
   00   active+clean 2018-11-09 04:35:32.320607
  0'0 544:1317 [0,2,1]  0 [0,2,1]  00'0
2018-11-09 04:35:32.320561 0'0 2018-11-04 11:55:54.756115
   0
2.5c143  0  143 0   0 19490267
 461  461 active+undersized+degraded 2018-11-08 19:02:03.873218
508'461 544:2100   [2,1]  2   [2,1]  2290'380
2018-11-07 18:58:43.043719  64'120 2018-11-05 14:21:49.256324
   0
.
sum 15239 0 2053 2659 0 2157615019 58286 58286
OSD_STAT USEDAVAIL  TOTAL  HB_PEERS PG_SUM PRIMARY_PG_SUM
23.7 GiB 28 GiB 32 GiB[0,1]200 73
13.7 GiB 28 GiB 32 GiB[0,2]200 58
03.7 GiB 28 GiB 32 GiB[1,2]173 69
sum   11 GiB 85 GiB 96 GiB

#ceph pg map 2.5c
osdmap e545 pg 2.5c (2.5c) -> up [2,1] acting [2,1]

#pg map 1.5f
osdmap e547 pg 1.5f (1.5f) -> up [0,2,1] acting [0,2,1]


On Fri, Nov 9, 2018 at 2:21 AM Martin Verges  wrote:

> Hello Vlad,
>
> Ceph clients connect to the primary OSD of each PG. If you create a
> crush rule for building1 and one for building2 that takes a OSD from
> the same building as the first one, your reads to the pool will always
> be on the same building (if the cluster is healthy) and only write
> request get replicated to the other building.
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
>
> 2018-11-09 4:54 GMT+01:00 Vlad Kopylov :
> > I am trying to test replicated ceph with servers in different buildings,
> and
> > I have a read problem.
> > Reads from one building go to osd in another building and vice versa,
> making
> > reads slower then writes! Making read as slow as slowest node.
> >
> > Is there a way to
> > - disable parallel read (so it reads only from the same osd node where
> mon
> > is);
> > - or give each client read restriction per osd?
> > - or maybe strictly specify read osd on mount;
> > - or have node read delay cap (for example 

Re: [ceph-users] mount rbd read only

2018-11-09 Thread Ashley Merrick
You need to tell it the username and the key ring to use.

I’m on my mobile right now so don’t have access to a server to check but If
you check the man of the RBD command it is something like id/name.

If your key ring is named the correct format it will find the key ring, if
not you can specify the location using —keyring

On Fri, 9 Nov 2018 at 11:41 PM, ST Wong (ITSC)  wrote:

> Thanks for your help.  Tried to follow steps in CEPH doc:
>
>
>
> On admin host:
>
>
>
> # ceph auth add client.acapp1 mon 'allow r' osd 'allow rw pool=4copy'
>
> # ceph auth export client.acapp1 > keyring
>
>
>
> Copy keyring to rbd client:/etc/ceph/keyring, and got following error:
>
>
>
> # rbd map 4copy/foo
>
> rbd: sysfs write failed
>
> rbd: couldn't connect to the cluster!
>
> In some cases useful info is found in syslog - try "dmesg | tail".
>
> rbd: map failed: (22) Invalid argument
>
>
>
> Also modified the capability as described in doc but gets same error:
>
>
>
> # ceph auth caps client.acapp1 mon 'allow r' osd 'allow class-read
> object_prefix rbd_children, allow pool templates r class-read, allow pool
> 4copy rwx'
>
>
>
> Would you help?Thanks a lot.
>
>
>
> Btw, shal /etc/ceph/ceph.client.admin.keyring be removed in ceph-ansible
> client deployment task?
>
>
>
> Thanks and Best Regards,
>
> /st wong
>
>
>
> *From:* Ashley Merrick 
> *Sent:* Friday, November 9, 2018 10:51 PM
> *To:* ST Wong (ITSC) 
> *Cc:* Wido den Hollander ; ceph-users@lists.ceph.com
>
>
> *Subject:* Re: [ceph-users] mount rbd read only
>
>
>
> You could create a key ring that only has perms to mount the RBD and read
> only to the mon’s.
>
>
>
> Depends if anyone that you wouldn’t trust with ceph commands has access to
> that VM / host.
>
>
>
> On Fri, 9 Nov 2018 at 10:47 PM, ST Wong (ITSC) 
> wrote:
>
> Stupid me.  I was focus on learning CEPH commands and forget something
> basic - haven't done mkfs.  Sorry for the trouble caused.
>
> Btw, is ceph.client.admin.keyring a must on client that mount rbd device?
> Any security concern?
>
> Sorry for the newbie questions.
> Thanks for all responded.
>
> Best Rgds
> /st wong
>
> -Original Message-
> From: ceph-users  On Behalf Of Wido
> den Hollander
> Sent: Thursday, November 8, 2018 8:31 PM
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] mount rbd read only
>
>
>
> On 11/8/18 1:05 PM, ST Wong (ITSC) wrote:
> > Hi,
> >
> >
> >
> > We created a testing rbd block device image as following:
> >
> >
> >
> > - cut here ---
> >
> > # rbd create 4copy/foo --size 10G
> >
> > # rbd feature disable 4copy/foo object-map fast-diff deep-flatten
> >
> > # rbd --image 4copy/foo info
> >
> > rbd image 'foo':
> >
> > size 10 GiB in 2560 objects
> >
> > order 22 (4 MiB objects)
> >
> > id: 122f36b8b4567
> >
> > block_name_prefix: rbd_data.122f36b8b4567
> >
> > format: 2
> >
> > features: layering, exclusive-lock
> >
> > op_features:
> >
> > flags:
> >
> > create_timestamp: Thu Nov  8 19:42:25 2018
> >
> >
> >
> > - cut here ---
> >
> >
> >
> > Then try to mount it on client but got error and can't be mounted:
> >
> >
> >
> > - cut here ---
> >
> > # mount  /dev/rbd0 /mnt
> >
> > mount: /dev/rbd0 is write-protected, mounting read-only
> >
> > mount: unknown filesystem type '(null)'
>
> Did you create a filesystem on it with mkfs? Are you sure there is a
> FileSystem on it?
>
> Wido
>
> >
> > - cut here ---
> >
> >
> >
> > Did we do any step incorrect?  We're using mimic.   Thanks.
> >
> >
> >
> >
> >
> >
> >
> > Besides, the rbd client is deployed through ceph-ansible as client
> > role and found that the ceph.client.admin.keyring from admin server
> > was also copied to the client machine.  Is it necessary?   Thanks a lot.
> >
> >
> >
> > Best Regards,
> >
> > /ST Wong
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mount rbd read only

2018-11-09 Thread ST Wong (ITSC)
Thanks for your help.  Tried to follow steps in CEPH doc:

On admin host:

# ceph auth add client.acapp1 mon 'allow r' osd 'allow rw pool=4copy'
# ceph auth export client.acapp1 > keyring

Copy keyring to rbd client:/etc/ceph/keyring, and got following error:

# rbd map 4copy/foo
rbd: sysfs write failed
rbd: couldn't connect to the cluster!
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (22) Invalid argument

Also modified the capability as described in doc but gets same error:

# ceph auth caps client.acapp1 mon 'allow r' osd 'allow class-read 
object_prefix rbd_children, allow pool templates r class-read, allow pool 4copy 
rwx'

Would you help?Thanks a lot.

Btw, shal /etc/ceph/ceph.client.admin.keyring be removed in ceph-ansible client 
deployment task?

Thanks and Best Regards,
/st wong

From: Ashley Merrick 
Sent: Friday, November 9, 2018 10:51 PM
To: ST Wong (ITSC) 
Cc: Wido den Hollander ; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] mount rbd read only

You could create a key ring that only has perms to mount the RBD and read only 
to the mon’s.

Depends if anyone that you wouldn’t trust with ceph commands has access to that 
VM / host.

On Fri, 9 Nov 2018 at 10:47 PM, ST Wong (ITSC) 
mailto:s...@itsc.cuhk.edu.hk>> wrote:
Stupid me.  I was focus on learning CEPH commands and forget something basic - 
haven't done mkfs.  Sorry for the trouble caused.

Btw, is ceph.client.admin.keyring a must on client that mount rbd device?  Any 
security concern?

Sorry for the newbie questions.
Thanks for all responded.

Best Rgds
/st wong

-Original Message-
From: ceph-users 
mailto:ceph-users-boun...@lists.ceph.com>> 
On Behalf Of Wido den Hollander
Sent: Thursday, November 8, 2018 8:31 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] mount rbd read only



On 11/8/18 1:05 PM, ST Wong (ITSC) wrote:
> Hi,
>
>
>
> We created a testing rbd block device image as following:
>
>
>
> - cut here ---
>
> # rbd create 4copy/foo --size 10G
>
> # rbd feature disable 4copy/foo object-map fast-diff deep-flatten
>
> # rbd --image 4copy/foo info
>
> rbd image 'foo':
>
> size 10 GiB in 2560 objects
>
> order 22 (4 MiB objects)
>
> id: 122f36b8b4567
>
> block_name_prefix: rbd_data.122f36b8b4567
>
> format: 2
>
> features: layering, exclusive-lock
>
> op_features:
>
> flags:
>
> create_timestamp: Thu Nov  8 19:42:25 2018
>
>
>
> - cut here ---
>
>
>
> Then try to mount it on client but got error and can't be mounted:
>
>
>
> - cut here ---
>
> # mount  /dev/rbd0 /mnt
>
> mount: /dev/rbd0 is write-protected, mounting read-only
>
> mount: unknown filesystem type '(null)'

Did you create a filesystem on it with mkfs? Are you sure there is a FileSystem 
on it?

Wido

>
> - cut here ---
>
>
>
> Did we do any step incorrect?  We're using mimic.   Thanks.
>
>
>
>
>
>
>
> Besides, the rbd client is deployed through ceph-ansible as client
> role and found that the ceph.client.admin.keyring from admin server
> was also copied to the client machine.  Is it necessary?   Thanks a lot.
>
>
>
> Best Regards,
>
> /ST Wong
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mount rbd read only

2018-11-09 Thread Ashley Merrick
You could create a key ring that only has perms to mount the RBD and read
only to the mon’s.

Depends if anyone that you wouldn’t trust with ceph commands has access to
that VM / host.

On Fri, 9 Nov 2018 at 10:47 PM, ST Wong (ITSC)  wrote:

> Stupid me.  I was focus on learning CEPH commands and forget something
> basic - haven't done mkfs.  Sorry for the trouble caused.
>
> Btw, is ceph.client.admin.keyring a must on client that mount rbd device?
> Any security concern?
>
> Sorry for the newbie questions.
> Thanks for all responded.
>
> Best Rgds
> /st wong
>
> -Original Message-
> From: ceph-users  On Behalf Of Wido
> den Hollander
> Sent: Thursday, November 8, 2018 8:31 PM
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] mount rbd read only
>
>
>
> On 11/8/18 1:05 PM, ST Wong (ITSC) wrote:
> > Hi,
> >
> >
> >
> > We created a testing rbd block device image as following:
> >
> >
> >
> > - cut here ---
> >
> > # rbd create 4copy/foo --size 10G
> >
> > # rbd feature disable 4copy/foo object-map fast-diff deep-flatten
> >
> > # rbd --image 4copy/foo info
> >
> > rbd image 'foo':
> >
> > size 10 GiB in 2560 objects
> >
> > order 22 (4 MiB objects)
> >
> > id: 122f36b8b4567
> >
> > block_name_prefix: rbd_data.122f36b8b4567
> >
> > format: 2
> >
> > features: layering, exclusive-lock
> >
> > op_features:
> >
> > flags:
> >
> > create_timestamp: Thu Nov  8 19:42:25 2018
> >
> >
> >
> > - cut here ---
> >
> >
> >
> > Then try to mount it on client but got error and can't be mounted:
> >
> >
> >
> > - cut here ---
> >
> > # mount  /dev/rbd0 /mnt
> >
> > mount: /dev/rbd0 is write-protected, mounting read-only
> >
> > mount: unknown filesystem type '(null)'
>
> Did you create a filesystem on it with mkfs? Are you sure there is a
> FileSystem on it?
>
> Wido
>
> >
> > - cut here ---
> >
> >
> >
> > Did we do any step incorrect?  We're using mimic.   Thanks.
> >
> >
> >
> >
> >
> >
> >
> > Besides, the rbd client is deployed through ceph-ansible as client
> > role and found that the ceph.client.admin.keyring from admin server
> > was also copied to the client machine.  Is it necessary?   Thanks a lot.
> >
> >
> >
> > Best Regards,
> >
> > /ST Wong
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mount rbd read only

2018-11-09 Thread ST Wong (ITSC)
Stupid me.  I was focus on learning CEPH commands and forget something basic - 
haven't done mkfs.  Sorry for the trouble caused.

Btw, is ceph.client.admin.keyring a must on client that mount rbd device?  Any 
security concern?

Sorry for the newbie questions.
Thanks for all responded.

Best Rgds
/st wong

-Original Message-
From: ceph-users  On Behalf Of Wido den 
Hollander
Sent: Thursday, November 8, 2018 8:31 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] mount rbd read only



On 11/8/18 1:05 PM, ST Wong (ITSC) wrote:
> Hi,
> 
>  
> 
> We created a testing rbd block device image as following:
> 
>  
> 
> - cut here ---
> 
> # rbd create 4copy/foo --size 10G
> 
> # rbd feature disable 4copy/foo object-map fast-diff deep-flatten
> 
> # rbd --image 4copy/foo info
> 
> rbd image 'foo':
> 
>     size 10 GiB in 2560 objects
> 
>     order 22 (4 MiB objects)
> 
>     id: 122f36b8b4567
> 
>     block_name_prefix: rbd_data.122f36b8b4567
> 
>     format: 2
> 
>     features: layering, exclusive-lock
> 
>     op_features:
> 
>     flags:
> 
>     create_timestamp: Thu Nov  8 19:42:25 2018
> 
>  
> 
> - cut here ---
> 
>  
> 
> Then try to mount it on client but got error and can't be mounted:
> 
>  
> 
> - cut here ---
> 
> # mount  /dev/rbd0 /mnt
> 
> mount: /dev/rbd0 is write-protected, mounting read-only
> 
> mount: unknown filesystem type '(null)'

Did you create a filesystem on it with mkfs? Are you sure there is a FileSystem 
on it?

Wido

> 
> - cut here ---
> 
>  
> 
> Did we do any step incorrect?  We're using mimic.   Thanks.
> 
>  
> 
>  
> 
>  
> 
> Besides, the rbd client is deployed through ceph-ansible as client 
> role and found that the ceph.client.admin.keyring from admin server 
> was also copied to the client machine.  Is it necessary?   Thanks a lot.
> 
>  
> 
> Best Regards,
> 
> /ST Wong
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-community] Pool broke after increase pg_num

2018-11-09 Thread Gesiel Galvão Bernardes
Hi,

The pool is back up and running. I made this actions:

- Increased max pg per OSD (ceph tell mon.* injectargs
'--mon_max_pg_per_osd=400'). But was still frozen. (already had OSDs with
251 pgs, then I not sure if this was the my problem.)
- Restarted all daemons, including OSDs. In a specific host, when I
restarted a OSD daemon, It took too long, and after this I saw that the
pool started rebuild.

I don't have a sure conclusion about what's happened, at least it's
working. I will read logs, now with more diem, for understanding exactly
happened.

Thank you all for your help.


Gesiel





Em sex, 9 de nov de 2018 às 03:37, Ashley Merrick 
escreveu:

> Are you sure the down OSD didn't happen to have any data required for the
> re-balance to complete? How long has the down now removed OSD been out?
> Before or after your increased PG count?
>
> If you do "ceph health detail" and then pick a stuck PG what does "ceph pg
> PG query" output?
>
> Has your ceph -s output changed at all since the last paste?
>
> On Fri, Nov 9, 2018 at 12:08 AM Gesiel Galvão Bernardes <
> gesiel.bernar...@gmail.com> wrote:
>
>> Em qui, 8 de nov de 2018 às 10:00, Joao Eduardo Luis 
>> escreveu:
>>
>>> Hello Gesiel,
>>>
>>> Welcome to Ceph!
>>>
>>> In the future, you may want to address the ceph-users list
>>> (`ceph-users@lists.ceph.com`) for this sort of issues.
>>>
>>>
>> Thank you, I will do.
>>
>> On 11/08/2018 11:18 AM, Gesiel Galvão Bernardes wrote:
>>> > Hi everyone,
>>> >
>>> > I am a beginner in Ceph. I made a increase of pg_num in a pool, and
>>> > after  the cluster rebalance I increased pgp_num (a confission: I not
>>> > had read the complete documentation about this operation :-(  ). Then
>>> > after this my cluster broken, and stoped all. The cluster not
>>> rebalance,
>>> > and my impression is that are all stopped.
>>> >
>>> > Below is my "ceph -s". Can anyone help-me?
>>>
>>> You have two osds down. Depending on how your data is mapped, your pgs
>>> may be waiting for those to come back up before they finish being
>>> cleaned up.
>>>
>>>
>>  After removed OSD downs, it is tried rebalance, but is "frozen" again,
>> in this status:
>>
>>   cluster:
>> id: ab5dcb0c-480d-419c-bcb8-013cbcce5c4d
>> health: HEALTH_WARN
>> 12840/988707 objects misplaced (1.299%)
>> Reduced data availability: 358 pgs inactive, 325 pgs peering
>>
>>   services:
>> mon: 3 daemons, quorum cmonitor,thanos,cmonitor2
>> mgr: thanos(active), standbys: cmonitor
>> osd: 17 osds: 17 up, 17 in; 221 remapped pgs
>>
>>   data:
>> pools:   1 pools, 1024 pgs
>> objects: 329.6 k objects, 1.3 TiB
>> usage:   3.8 TiB used, 7.4 TiB / 11 TiB avail
>> pgs: 1.660% pgs unknown
>>  33.301% pgs not active
>>  12840/988707 objects misplaced (1.299%)
>>  666 active+clean
>>  188 remapped+peering
>>  137 peering
>>  17  unknown
>>  16  activating+remapped
>>
>> Any other idea?
>>
>>
>> Gesiel
>>
>>>
>>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] slow ops after cephfs snapshot removal

2018-11-09 Thread Kenneth Waegeman

Hi all,

On Mimic 13.2.1, we are seeing blocked ops on cephfs after removing some 
snapshots:


[root@osd001 ~]# ceph -s
  cluster:
    id: 92bfcf0a-1d39-43b3-b60f-44f01b630e47
    health: HEALTH_WARN
    5 slow ops, oldest one blocked for 1162 sec, mon.mds03 has 
slow ops


  services:
    mon: 3 daemons, quorum mds01,mds02,mds03
    mgr: mds02(active), standbys: mds03, mds01
    mds: ceph_fs-2/2/2 up  {0=mds03=up:active,1=mds01=up:active}, 1 
up:standby

    osd: 544 osds: 544 up, 544 in

  io:
    client:   5.4 KiB/s wr, 0 op/s rd, 0 op/s wr

[root@osd001 ~]# ceph health detail
HEALTH_WARN 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has 
slow ops

SLOW_OPS 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has slow ops

[root@osd001 ~]# ceph -v
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic 
(stable)


Is this a known issue?

Cheers,

Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com