Re: [ceph-users] All replicas of pg 5.b got placed on the same host - how to correct?

2017-10-11 Thread Konrad Riedel

Thanks a lot - problem fixed.


On 10.10.2017 16:58, Peter Linder wrote:

I think your failure domain within your rules is wrong.

step choose firstn 0 type osd

Should be:

step choose firstn 0 type host


On 10/10/2017 5:05 PM, Konrad Riedel wrote:

Hello Ceph-users,

after switching to luminous I was excited about the great
crush-device-class feature - now we have 5 servers with 1x2TB NVMe
based OSDs, 3 of them additionally with 4 HDDS per server. (we have
only three 400G NVMe disks for block.wal and block.db and therefore
can't distribute all HDDs evenly on all servers.)

Output from "ceph pg dump" shows that some PGs end up on HDD OSDs on
the same
Host:

ceph pg map 5.b
osdmap e12912 pg 5.b (5.b) -> up [9,7,8] acting [9,7,8]

(on rebooting this host I had 4 stale PGs)

I've written a small perl script to add hostname after OSD number and
got many PGs where
ceph placed 2 replicas on the same host... :

5.1e7: 8 - daniel 9 - daniel 11 - udo
5.1eb: 10 - udo 7 - daniel 9 - daniel
5.1ec: 10 - udo 11 - udo 7 - daniel
5.1ed: 13 - felix 16 - felix 5 - udo


Is there any way I can correct this?


Please see crushmap below. Thanks for any help!

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 device1
device 2 osd.2 class ssd
device 3 device3
device 4 device4
device 5 osd.5 class hdd
device 6 device6
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 device15
device 16 osd.16 class hdd
device 17 device17
device 18 device18
device 19 device19
device 20 device20
device 21 device21
device 22 device22
device 23 device23
device 24 osd.24 class hdd
device 25 device25
device 26 osd.26 class hdd
device 27 osd.27 class hdd
device 28 osd.28 class hdd
device 29 osd.29 class hdd
device 30 osd.30 class ssd
device 31 osd.31 class ssd
device 32 osd.32 class ssd
device 33 osd.33 class ssd

# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 root

# buckets
host daniel {
 id -4    # do not change unnecessarily
 id -2 class hdd    # do not change unnecessarily
 id -9 class ssd    # do not change unnecessarily
 # weight 3.459
 alg straw2
 hash 0    # rjenkins1
 item osd.31 weight 1.819
 item osd.7 weight 0.547
 item osd.8 weight 0.547
 item osd.9 weight 0.547
}
host felix {
 id -5    # do not change unnecessarily
 id -3 class hdd    # do not change unnecessarily
 id -10 class ssd    # do not change unnecessarily
 # weight 3.653
 alg straw2
 hash 0    # rjenkins1
 item osd.33 weight 1.819
 item osd.13 weight 0.547
 item osd.14 weight 0.467
 item osd.16 weight 0.547
 item osd.0 weight 0.274
}
host udo {
 id -6    # do not change unnecessarily
 id -7 class hdd    # do not change unnecessarily
 id -11 class ssd    # do not change unnecessarily
 # weight 4.006
 alg straw2
 hash 0    # rjenkins1
 item osd.32 weight 1.819
 item osd.5 weight 0.547
 item osd.10 weight 0.547
 item osd.11 weight 0.547
 item osd.12 weight 0.547
}
host moritz {
 id -13    # do not change unnecessarily
 id -14 class hdd    # do not change unnecessarily
 id -15 class ssd    # do not change unnecessarily
 # weight 1.819
 alg straw2
 hash 0    # rjenkins1
 item osd.30 weight 1.819
}
host bruno {
 id -16    # do not change unnecessarily
 id -17 class hdd    # do not change unnecessarily
 id -18 class ssd    # do not change unnecessarily
 # weight 3.183
 alg straw2
 hash 0    # rjenkins1
 item osd.24 weight 0.273
 item osd.26 weight 0.273
 item osd.27 weight 0.273
 item osd.28 weight 0.273
 item osd.29 weight 0.273
 item osd.2 weight 1.819
}
root default {
 id -1    # do not change unnecessarily
 id -8 class hdd    # do not change unnecessarily
 id -12 class ssd    # do not change unnecessarily
 # weight 16.121
 alg straw2
 hash 0    # rjenkins1
 item daniel weight 3.459
 item felix weight 3.653
 item udo weight 4.006
 item moritz weight 1.819
 item bruno weight 3.183
}

# rules
rule ssd {
 id 0
 type replicated
 min_size 1
 max_size 10
 step take default class ssd
 step choose firstn 0 type osd
 step emit
}
rule hdd {
 id 1
 type replicated
 min_size 1
 max_size 10
 step take default class hdd
 step choose firstn 0 type osd
 step emit
}

# end crush map



--

Mit freundlichen Grüßen

Konrad Riedel

--

Berufsförderung

[ceph-users] All replicas of pg 5.b got placed on the same host - how to correct?

2017-10-10 Thread Konrad Riedel
lichen Grüßen

Konrad Riedel

--

Berufsförderungswerk Dresden gemeinnützige GmbH
SG1
IT/Infrastruktur
Hellerhofstraße 35
D-01129 Dresden
Telefon (03 51) 85 48 - 115
Telefax (03 51) 85 48 - 507
E-Mail   i...@bfw-dresden.de


Vorsitzende des Verwaltungsrates: Dr. Ina Ueberschär
Geschäftsführer: Henry Köhler
Unternehmenssitz: Dresden
Handelsregister: Amtsgericht Dresden HRB 2380
Zertifiziert nach ISO 9001:2015 und AZAV


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] One Monitor filling the logs

2017-08-08 Thread Konrad Riedel

Hi Ceph users,

my luminous (ceph version 12.1.1) testcluster is doing fine, except that 
one Monitor is filling the logs


 -rw-r--r-- 1 ceph ceph 119M Aug  8 15:27 ceph-mon.1.log

ceph-mon.1.log:

2017-08-08 15:57:49.509176 7ff4573c4700  0 log_channel(cluster) log 
[DBG] : Standby manager daemon felix started
2017-08-08 15:57:49.646006 7ff4573c4700  0 log_channel(cluster) log 
[DBG] : Standby manager daemon daniel started
2017-08-08 15:57:49.830046 7ff45d13a700  0 log_channel(cluster) log 
[DBG] : mgrmap e256330: udo(active)
2017-08-08 15:57:51.509410 7ff4573c4700  0 log_channel(cluster) log 
[DBG] : Standby manager daemon felix started
2017-08-08 15:57:51.646269 7ff4573c4700  0 log_channel(cluster) log 
[DBG] : Standby manager daemon daniel started
2017-08-08 15:57:52.054987 7ff45d13a700  0 log_channel(cluster) log 
[DBG] : mgrmap e256331: udo(active)


I've tried to reduce the debug settings ( "debug_mon": "0/1", 
"debug_monc": "0/1"), but I still get 3 messages per

second. Does anybody know how to mute this?

All log settings (defaults):

{
"name": "mon.1",
"cluster": "ceph",
"debug_none": "0/5",
"debug_lockdep": "0/1",
"debug_context": "0/1",
"debug_crush": "1/1",
"debug_mds": "1/5",
"debug_mds_balancer": "1/5",
"debug_mds_locker": "1/5",
"debug_mds_log": "1/5",
"debug_mds_log_expire": "1/5",
"debug_mds_migrator": "1/5",
"debug_buffer": "0/1",
"debug_timer": "0/1",
"debug_filer": "0/1",
"debug_striper": "0/1",
"debug_objecter": "0/1",
"debug_rados": "0/5",
"debug_rbd": "0/5",
"debug_rbd_mirror": "0/5",
"debug_rbd_replay": "0/5",
"debug_journaler": "0/5",
"debug_objectcacher": "0/5",
"debug_client": "0/5",
"debug_osd": "1/5",
"debug_optracker": "0/5",
"debug_objclass": "0/5",
"debug_filestore": "1/3",
"debug_journal": "1/3",
"debug_ms": "0/5",
"debug_mon": "0/1",
"debug_monc": "0/1",
"debug_paxos": "1/5",
"debug_tp": "0/5",
"debug_auth": "1/5",
    "debug_crypto": "1/5",
"debug_finisher": "1/1",
"debug_heartbeatmap": "1/5",
"debug_perfcounter": "1/5",
"debug_rgw": "1/5",
"debug_civetweb": "1/10",
"debug_javaclient": "1/5",
"debug_asok": "1/5",
"debug_throttle": "1/1",
"debug_refs": "0/0",
"debug_xio": "1/5",
"debug_compressor": "1/5",
"debug_bluestore": "1/5",
"debug_bluefs": "1/5",
"debug_bdev": "1/3",
"debug_kstore": "1/5",
"debug_rocksdb": "4/5",
"debug_leveldb": "4/5",
"debug_memdb": "4/5",
"debug_kinetic": "1/5",
"debug_fuse": "1/5",
"debug_mgr": "1/5",
"debug_mgrc": "1/5",
"debug_dpdk": "1/5",
"debug_eventtrace": "1/5",
"host": "felix",

Thanks & regards

Konrad Riedel

--


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com