from:"huang jun"

[ceph-users] gitbuilder.ceph.com service timeout?

2019-12-27 Thread huang jun

Hi, all
The apt-mirror.sepia.ceph.com and gitbuilder.ceph.com are down?
I can't ping it.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cluster in ERR status when rebalancing

2019-12-09 Thread huang jun

what about the pool's backfill_full_ratio value?

Simone Lazzaris  于2019年12月9日周一 下午6:38写道：
>
> Hi all;
>
> Long story short, I have a cluster of 26 OSD in 3 nodes (8+9+9). One of the 
> disk is showing some read error, so I''ve added an OSD in the faulty node 
> (OSD.26) and set the (re)weight of the faulty OSD (OSD.12) to zero.
>
>
>
> The cluster is now rebalancing, which is fine, but I have now 2 PG in 
> "backfill_toofull" state, so the cluster health is "ERR":
>
>
>
> cluster:
>
> id: 9ec27b0f-acfd-40a3-b35d-db301ac5ce8c
>
> health: HEALTH_ERR
>
> Degraded data redundancy (low space): 2 pgs backfill_toofull
>
> services:
>
> mon: 3 daemons, quorum s1,s2,s3 (age 7d)
>
> mgr: s1(active, since 7d), standbys: s2, s3
>
> osd: 27 osds: 27 up (since 2h), 26 in (since 2h); 262 remapped pgs
>
> rgw: 3 daemons active (s1, s2, s3)
>
> data:
>
> pools: 10 pools, 1200 pgs
>
> objects: 11.72M objects, 37 TiB
>
> usage: 57 TiB used, 42 TiB / 98 TiB avail
>
> pgs: 2618510/35167194 objects misplaced (7.446%)
>
> 938 active+clean
>
> 216 active+remapped+backfill_wait
>
> 44 active+remapped+backfilling
>
> 2 active+remapped+backfill_wait+backfill_toofull
>
> io:
>
> recovery: 163 MiB/s, 50 objects/s
>
> progress:
>
> Rebalancing after osd.12 marked out
>
> [=.]
>
> As you can see, there is plenty of space and none of my OSD is in full or 
> near full state:
>
>
>
> ++--+---+---++-++-+---+
>
> | id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
>
> ++--+---+---++-++-+---+
>
> | 0 | s1 | 2415G | 1310G | 0 | 0 | 0 | 0 | exists,up |
>
> | 1 | s2 | 2009G | 1716G | 0 | 0 | 0 | 0 | exists,up |
>
> | 2 | s3 | 2183G | 1542G | 0 | 0 | 0 | 0 | exists,up |
>
> | 3 | s1 | 2680G | 1045G | 0 | 0 | 0 | 0 | exists,up |
>
> | 4 | s2 | 2063G | 1662G | 0 | 0 | 0 | 0 | exists,up |
>
> | 5 | s3 | 2269G | 1456G | 0 | 0 | 0 | 0 | exists,up |
>
> | 6 | s1 | 2523G | 1202G | 0 | 0 | 0 | 0 | exists,up |
>
> | 7 | s2 | 1973G | 1752G | 0 | 0 | 0 | 0 | exists,up |
>
> | 8 | s3 | 2007G | 1718G | 0 | 0 | 1 | 0 | exists,up |
>
> | 9 | s1 | 2485G | 1240G | 0 | 0 | 0 | 0 | exists,up |
>
> | 10 | s2 | 2385G | 1340G | 0 | 0 | 0 | 0 | exists,up |
>
> | 11 | s3 | 2079G | 1646G | 0 | 0 | 0 | 0 | exists,up |
>
> | 12 | s1 | 2272G | 1453G | 0 | 0 | 0 | 0 | exists,up |
>
> | 13 | s2 | 2381G | 1344G | 0 | 0 | 0 | 0 | exists,up |
>
> | 14 | s3 | 1923G | 1802G | 0 | 0 | 0 | 0 | exists,up |
>
> | 15 | s1 | 2617G | 1108G | 0 | 0 | 0 | 0 | exists,up |
>
> | 16 | s2 | 2099G | 1626G | 0 | 0 | 0 | 0 | exists,up |
>
> | 17 | s3 | 2336G | 1389G | 0 | 0 | 0 | 0 | exists,up |
>
> | 18 | s1 | 2435G | 1290G | 0 | 0 | 0 | 0 | exists,up |
>
> | 19 | s2 | 2198G | 1527G | 0 | 0 | 0 | 0 | exists,up |
>
> | 20 | s3 | 2159G | 1566G | 0 | 0 | 0 | 0 | exists,up |
>
> | 21 | s1 | 2128G | 1597G | 0 | 0 | 0 | 0 | exists,up |
>
> | 22 | s3 | 2064G | 1661G | 0 | 0 | 0 | 0 | exists,up |
>
> | 23 | s2 | 1943G | 1782G | 0 | 0 | 0 | 0 | exists,up |
>
> | 24 | s3 | 2168G | 1557G | 0 | 0 | 0 | 0 | exists,up |
>
> | 25 | s2 | 2113G | 1612G | 0 | 0 | 0 | 0 | exists,up |
>
> | 26 | s1 | 68.9G | 3657G | 0 | 0 | 0 | 0 | exists,up |
>
> ++--+---+---++-++-+---+
>
>
> Why is this happening? I thought that maybe the 2 PG marked as toofull 
> involved either the OSD.12 (which is emptying) or the 26 (the new one) but it 
> seems that this is not the case:
>
>
>
> root@s1:~# ceph pg dump|egrep 'toofull|PG_STAT'
>
> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES 
> OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP 
> UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB 
> DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
>
> 6.212 0 0 0 0 0 38145321727 0 0 3023 3023 
> active+remapped+backfill_wait+backfill_toofull 2019-12-09 11:11:39.093042 
> 13598'212053 13713:1179718 [6,19,24] 6 [13,0,24] 13 13549'211985 2019-12-08 
> 19:46:10.461113 11644'211779 2019-12-06 07:37:42.864325 0
>
> 6.bc 11057 0 0 22114 0 37733931136 0 0 3032 3032 
> active+remapped+backfill_wait+backfill_toofull 2019-12-09 10:42:25.534277 
> 13549'212110 13713:1229839 [15,25,17] 15 [19,18,17] 19 13549'211983 
> 2019-12-08 11:02:45.846031 11644'211854 2019-12-06 06:22:43.565313 0
>
>
>
> Any hints? I'm not worried because I think that the cluster will heal 
> himself, but this is not clear and logic.
>
>
>
> --
>
> Simone Lazzaris
> Staff R
>
> Qcom S.p.A.
> Via Roggia Vignola, 9 | 24047 Treviglio (BG)
> T +39 0363 47905 | D +39 0363 1970352
> simone.lazza...@qcom.it | www.qcom.it
>
> Qcom Official Pages LinkedIn | Facebook
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-09 Thread huang jun

The same problem:
2019-11-10 05:26:33.215 7fbfafeef700  7 mon.cn1@0(leader).osd e1819
preprocess_boot from osd.0 v2:10.50.11.41:6814/2022032 clashes with
existing osd: different fsid (ours:
ccfdbd54-fcd2-467f-ab7b-c152b7e422fb ; theirs: a1ea2ea3-984d
-4c91-86cf-29f452f5a952)
maybe the osd uuid is wrong.
what the output of 'ceph osd metadata 0' and 'cat
/var/lib/ceph/osd/ceph-0/fsid'?

nokia ceph  于2019年11月10日周日 下午2:47写道：
>
> Hi,
>
> yes still the cluster unrecovered. Not able to even up the osd.0 yet.
>
> osd logs: https://pastebin.com/4WrpgrH5
>
> Mon logs: https://drive.google.com/open?id=1_HqK2d52Cgaps203WnZ0mCfvxdcjcBoE
>
> # ceph daemon /var/run/ceph/ceph-mon.cn1.asok config show|grep debug_mon
> "debug_mon": "20/20",
> "debug_monc": "0/0",
>
>
> # date; systemctl restart ceph-osd@0.service;date
> Sun Nov 10 05:25:54 UTC 2019
> Sun Nov 10 05:25:55 UTC 2019
>
>
> cn1.chn8be1c1.cdn ~# systemctl status ceph-osd@0.service
> ● ceph-osd@0.service - Ceph object storage daemon osd.0
>Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; 
> enabled-runtime; vendor preset: disabled)
>   Drop-In: /etc/systemd/system/ceph-osd@.service.d
>└─90-ExecStart_NUMA.conf
>Active: active (running) since Sun 2019-11-10 05:25:55 UTC; 8s ago
>   Process: 2022026 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster 
> ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
>  Main PID: 2022032 (ceph-osd)
>CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
>└─2022032 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser 
> ceph --setgroup ceph
>
> Nov 10 05:25:55 cn1.chn8be1c1.cdn systemd[1]: Starting Ceph object storage 
> daemon osd.0...
> Nov 10 05:25:55 cn1.chn8be1c1.cdn systemd[1]: Started Ceph object storage 
> daemon osd.0.
> Nov 10 05:26:03 cn1.chn8be1c1.cdn numactl[2022032]: 2019-11-10 05:26:03.131 
> 7fbef7bb5d80 -1 osd.0 1795 log_to_monitors {default=true}
> Nov 10 05:26:03 cn1.chn8be1c1.cdn numactl[2022032]: 2019-11-10 05:26:03.372 
> 7fbeea1c0700 -1 osd.0 1795 set_numa_affinity unable to identify public 
> interface 'dss-client' numa node: (2) No such file or directory
> Hint: Some lines were ellipsized, use -l to show in full.
>
>
> # ceph tell mon.cn1 injectargs '--debug-mon 1/5'
> injectargs:
>
> cn1.chn8be1c1.cdn ~# ceph daemon /var/run/ceph/ceph-mon.cn1.asok config 
> show|grep debug_mon
> "debug_mon": "1/5",
> "debug_monc": "0/0",
>
>
>
>
> On Sun, Nov 10, 2019 at 11:05 AM huang jun  wrote:
>>
>> good, please send me the mon and osd.0 log.
>> the cluster still un-recovered?
>>
>> nokia ceph  于2019年11月10日周日 下午1:24写道：
>> >
>> > Hi Huang,
>> >
>> > Yes the node 10.50.10.45 is the fifth node which is replaced. Yes I have 
>> > set the debug_mon to 20 and still it is running with that value only. If 
>> > you want I will send you the logs of the mon once again by restarting the 
>> > osd.0
>> >
>> > On Sun, Nov 10, 2019 at 10:17 AM huang jun  wrote:
>> >>
>> >> The mon log shows that the all mismatch fsid osds are from node 
>> >> 10.50.11.45,
>> >> maybe that the fith node?
>> >> BTW i don't found the osd.0 boot message in ceph-mon.log
>> >> do you set debug_mon=20 first and then restart osd.0 process, and make
>> >> sure the osd.0 is restarted.
>> >>
>> >>
>> >> nokia ceph  于2019年11月10日周日 下午12:31写道：
>> >>
>> >> >
>> >> > Hi,
>> >> >
>> >> > Please find the ceph osd tree output in the pastebin 
>> >> > https://pastebin.com/Gn93rE6w
>> >> >
>> >> > On Fri, Nov 8, 2019 at 7:58 PM huang jun  wrote:
>> >> >>
>> >> >> can you post your 'ceph osd tree' in pastebin?
>> >> >> do you mean the osds report fsid mismatch is from old removed nodes?
>> >> >>
>> >> >> nokia ceph  于2019年11月8日周五 下午10:21写道：
>> >> >> >
>> >> >> > Hi,
>> >> >> >
>> >> >> > The fifth node in the cluster was affected by hardware failure and 
>> >> >> > hence the node was replaced in the ceph cluster. But we were not 
>> >> >> > able to replace it properly and hence we uninstalled the ceph in all 
>> >> >> > the nodes, deleted the pools and also zapped the osd's and recreated 
>> >> >> > them as new ceph cluster. But not sure where from the

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-09 Thread huang jun

good, please send me the mon and osd.0 log.
the cluster still un-recovered?

nokia ceph  于2019年11月10日周日 下午1:24写道：
>
> Hi Huang,
>
> Yes the node 10.50.10.45 is the fifth node which is replaced. Yes I have set 
> the debug_mon to 20 and still it is running with that value only. If you want 
> I will send you the logs of the mon once again by restarting the osd.0
>
> On Sun, Nov 10, 2019 at 10:17 AM huang jun  wrote:
>>
>> The mon log shows that the all mismatch fsid osds are from node 10.50.11.45,
>> maybe that the fith node?
>> BTW i don't found the osd.0 boot message in ceph-mon.log
>> do you set debug_mon=20 first and then restart osd.0 process, and make
>> sure the osd.0 is restarted.
>>
>>
>> nokia ceph  于2019年11月10日周日 下午12:31写道：
>>
>> >
>> > Hi,
>> >
>> > Please find the ceph osd tree output in the pastebin 
>> > https://pastebin.com/Gn93rE6w
>> >
>> > On Fri, Nov 8, 2019 at 7:58 PM huang jun  wrote:
>> >>
>> >> can you post your 'ceph osd tree' in pastebin?
>> >> do you mean the osds report fsid mismatch is from old removed nodes?
>> >>
>> >> nokia ceph  于2019年11月8日周五 下午10:21写道：
>> >> >
>> >> > Hi,
>> >> >
>> >> > The fifth node in the cluster was affected by hardware failure and 
>> >> > hence the node was replaced in the ceph cluster. But we were not able 
>> >> > to replace it properly and hence we uninstalled the ceph in all the 
>> >> > nodes, deleted the pools and also zapped the osd's and recreated them 
>> >> > as new ceph cluster. But not sure where from the reference for the old 
>> >> > fifth nodes(failed nodes) osd's fsid's are coming from still. Is this 
>> >> > creating the problem. Because I am seeing that the OSD's in the fifth 
>> >> > node are showing up in the ceph status whereas the other nodes osd's 
>> >> > are showing down.
>> >> >
>> >> > On Fri, Nov 8, 2019 at 7:25 PM huang jun  wrote:
>> >> >>
>> >> >> I saw many lines like that
>> >> >>
>> >> >> mon.cn1@0(leader).osd e1805 preprocess_boot from osd.112
>> >> >> v2:10.50.11.45:6822/158344 clashes with existing osd: different fsid
>> >> >> (ours: 85908622-31bd-4728-9be3-f1f6ca44ed98 ; theirs:
>> >> >> 127fdc44-c17e-42ee-bcd4-d577c0ef4479)
>> >> >> the osd boot will be ignored if the fsid mismatch
>> >> >> what do you do before this happen?
>> >> >>
>> >> >> nokia ceph  于2019年11月8日周五 下午8:29写道：
>> >> >> >
>> >> >> > Hi,
>> >> >> >
>> >> >> > Please find the osd.0 which is restarted after the debug_mon is 
>> >> >> > increased to 20.
>> >> >> >
>> >> >> > cn1.chn8be1c1.cdn ~# date;systemctl restart ceph-osd@0.service
>> >> >> > Fri Nov  8 12:25:05 UTC 2019
>> >> >> >
>> >> >> > cn1.chn8be1c1.cdn ~# systemctl status ceph-osd@0.service -l
>> >> >> > ● ceph-osd@0.service - Ceph object storage daemon osd.0
>> >> >> >Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; 
>> >> >> > enabled-runtime; vendor preset: disabled)
>> >> >> >   Drop-In: /etc/systemd/system/ceph-osd@.service.d
>> >> >> >└─90-ExecStart_NUMA.conf
>> >> >> >Active: active (running) since Fri 2019-11-08 12:25:06 UTC; 29s 
>> >> >> > ago
>> >> >> >   Process: 298505 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh 
>> >> >> > --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
>> >> >> >  Main PID: 298512 (ceph-osd)
>> >> >> >CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
>> >> >> >└─298512 /usr/bin/ceph-osd -f --cluster ceph --id 0 
>> >> >> > --setuser ceph --setgroup ceph
>> >> >> >
>> >> >> > Nov 08 12:25:06 cn1.chn8be1c1.cdn systemd[1]: Starting Ceph object 
>> >> >> > storage daemon osd.0...
>> >> >> > Nov 08 12:25:06 cn1.chn8be1c1.cdn systemd[1]: Started Ceph object 
>> >> >> > storage daemon osd.0.
>> >> >> > Nov 08 12:25:11 cn1.chn8be1c1.cdn numactl[298512]: 2019-11-08

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-09 Thread huang jun

The mon log shows that the all mismatch fsid osds are from node 10.50.11.45,
maybe that the fith node?
BTW i don't found the osd.0 boot message in ceph-mon.log
do you set debug_mon=20 first and then restart osd.0 process, and make
sure the osd.0 is restarted.


nokia ceph  于2019年11月10日周日 下午12:31写道：

>
> Hi,
>
> Please find the ceph osd tree output in the pastebin 
> https://pastebin.com/Gn93rE6w
>
> On Fri, Nov 8, 2019 at 7:58 PM huang jun  wrote:
>>
>> can you post your 'ceph osd tree' in pastebin?
>> do you mean the osds report fsid mismatch is from old removed nodes?
>>
>> nokia ceph  于2019年11月8日周五 下午10:21写道：
>> >
>> > Hi,
>> >
>> > The fifth node in the cluster was affected by hardware failure and hence 
>> > the node was replaced in the ceph cluster. But we were not able to replace 
>> > it properly and hence we uninstalled the ceph in all the nodes, deleted 
>> > the pools and also zapped the osd's and recreated them as new ceph 
>> > cluster. But not sure where from the reference for the old fifth 
>> > nodes(failed nodes) osd's fsid's are coming from still. Is this creating 
>> > the problem. Because I am seeing that the OSD's in the fifth node are 
>> > showing up in the ceph status whereas the other nodes osd's are showing 
>> > down.
>> >
>> > On Fri, Nov 8, 2019 at 7:25 PM huang jun  wrote:
>> >>
>> >> I saw many lines like that
>> >>
>> >> mon.cn1@0(leader).osd e1805 preprocess_boot from osd.112
>> >> v2:10.50.11.45:6822/158344 clashes with existing osd: different fsid
>> >> (ours: 85908622-31bd-4728-9be3-f1f6ca44ed98 ; theirs:
>> >> 127fdc44-c17e-42ee-bcd4-d577c0ef4479)
>> >> the osd boot will be ignored if the fsid mismatch
>> >> what do you do before this happen?
>> >>
>> >> nokia ceph  于2019年11月8日周五 下午8:29写道：
>> >> >
>> >> > Hi,
>> >> >
>> >> > Please find the osd.0 which is restarted after the debug_mon is 
>> >> > increased to 20.
>> >> >
>> >> > cn1.chn8be1c1.cdn ~# date;systemctl restart ceph-osd@0.service
>> >> > Fri Nov  8 12:25:05 UTC 2019
>> >> >
>> >> > cn1.chn8be1c1.cdn ~# systemctl status ceph-osd@0.service -l
>> >> > ● ceph-osd@0.service - Ceph object storage daemon osd.0
>> >> >Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; 
>> >> > enabled-runtime; vendor preset: disabled)
>> >> >   Drop-In: /etc/systemd/system/ceph-osd@.service.d
>> >> >└─90-ExecStart_NUMA.conf
>> >> >Active: active (running) since Fri 2019-11-08 12:25:06 UTC; 29s ago
>> >> >   Process: 298505 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh 
>> >> > --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
>> >> >  Main PID: 298512 (ceph-osd)
>> >> >CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
>> >> >└─298512 /usr/bin/ceph-osd -f --cluster ceph --id 0 
>> >> > --setuser ceph --setgroup ceph
>> >> >
>> >> > Nov 08 12:25:06 cn1.chn8be1c1.cdn systemd[1]: Starting Ceph object 
>> >> > storage daemon osd.0...
>> >> > Nov 08 12:25:06 cn1.chn8be1c1.cdn systemd[1]: Started Ceph object 
>> >> > storage daemon osd.0.
>> >> > Nov 08 12:25:11 cn1.chn8be1c1.cdn numactl[298512]: 2019-11-08 
>> >> > 12:25:11.538 7f8515323d80 -1 osd.0 1795 log_to_monitors {default=true}
>> >> > Nov 08 12:25:11 cn1.chn8be1c1.cdn numactl[298512]: 2019-11-08 
>> >> > 12:25:11.689 7f850792e700 -1 osd.0 1795 set_numa_affinity unable to 
>> >> > identify public interface 'dss-client' numa node: (2) No such file or 
>> >> > directory
>> >> >
>> >> > On Fri, Nov 8, 2019 at 4:48 PM huang jun  wrote:
>> >> >>
>> >> >> the osd.0 is still in down state after restart? if so, maybe the
>> >> >> problem is in mon,
>> >> >> can you set the leader mon's debug_mon=20 and restart one of the down
>> >> >> state osd.
>> >> >> and then attach the mon log file.
>> >> >>
>> >> >> nokia ceph  于2019年11月8日周五 下午6:38写道：
>> >> >> >
>> >> >> > Hi,
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > Below is th

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-08 Thread huang jun

can you post your 'ceph osd tree' in pastebin?
do you mean the osds report fsid mismatch is from old removed nodes?

nokia ceph  于2019年11月8日周五 下午10:21写道：
>
> Hi,
>
> The fifth node in the cluster was affected by hardware failure and hence the 
> node was replaced in the ceph cluster. But we were not able to replace it 
> properly and hence we uninstalled the ceph in all the nodes, deleted the 
> pools and also zapped the osd's and recreated them as new ceph cluster. But 
> not sure where from the reference for the old fifth nodes(failed nodes) osd's 
> fsid's are coming from still. Is this creating the problem. Because I am 
> seeing that the OSD's in the fifth node are showing up in the ceph status 
> whereas the other nodes osd's are showing down.
>
> On Fri, Nov 8, 2019 at 7:25 PM huang jun  wrote:
>>
>> I saw many lines like that
>>
>> mon.cn1@0(leader).osd e1805 preprocess_boot from osd.112
>> v2:10.50.11.45:6822/158344 clashes with existing osd: different fsid
>> (ours: 85908622-31bd-4728-9be3-f1f6ca44ed98 ; theirs:
>> 127fdc44-c17e-42ee-bcd4-d577c0ef4479)
>> the osd boot will be ignored if the fsid mismatch
>> what do you do before this happen?
>>
>> nokia ceph  于2019年11月8日周五 下午8:29写道：
>> >
>> > Hi,
>> >
>> > Please find the osd.0 which is restarted after the debug_mon is increased 
>> > to 20.
>> >
>> > cn1.chn8be1c1.cdn ~# date;systemctl restart ceph-osd@0.service
>> > Fri Nov  8 12:25:05 UTC 2019
>> >
>> > cn1.chn8be1c1.cdn ~# systemctl status ceph-osd@0.service -l
>> > ● ceph-osd@0.service - Ceph object storage daemon osd.0
>> >Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; 
>> > enabled-runtime; vendor preset: disabled)
>> >   Drop-In: /etc/systemd/system/ceph-osd@.service.d
>> >└─90-ExecStart_NUMA.conf
>> >Active: active (running) since Fri 2019-11-08 12:25:06 UTC; 29s ago
>> >   Process: 298505 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh 
>> > --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
>> >  Main PID: 298512 (ceph-osd)
>> >CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
>> >└─298512 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser 
>> > ceph --setgroup ceph
>> >
>> > Nov 08 12:25:06 cn1.chn8be1c1.cdn systemd[1]: Starting Ceph object storage 
>> > daemon osd.0...
>> > Nov 08 12:25:06 cn1.chn8be1c1.cdn systemd[1]: Started Ceph object storage 
>> > daemon osd.0.
>> > Nov 08 12:25:11 cn1.chn8be1c1.cdn numactl[298512]: 2019-11-08 12:25:11.538 
>> > 7f8515323d80 -1 osd.0 1795 log_to_monitors {default=true}
>> > Nov 08 12:25:11 cn1.chn8be1c1.cdn numactl[298512]: 2019-11-08 12:25:11.689 
>> > 7f850792e700 -1 osd.0 1795 set_numa_affinity unable to identify public 
>> > interface 'dss-client' numa node: (2) No such file or directory
>> >
>> > On Fri, Nov 8, 2019 at 4:48 PM huang jun  wrote:
>> >>
>> >> the osd.0 is still in down state after restart? if so, maybe the
>> >> problem is in mon,
>> >> can you set the leader mon's debug_mon=20 and restart one of the down
>> >> state osd.
>> >> and then attach the mon log file.
>> >>
>> >> nokia ceph  于2019年11月8日周五 下午6:38写道：
>> >> >
>> >> > Hi,
>> >> >
>> >> >
>> >> >
>> >> > Below is the status of the OSD after restart.
>> >> >
>> >> >
>> >> >
>> >> > # systemctl status ceph-osd@0.service
>> >> >
>> >> > ● ceph-osd@0.service - Ceph object storage daemon osd.0
>> >> >
>> >> >Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; 
>> >> > enabled-runtime; vendor preset: disabled)
>> >> >
>> >> >   Drop-In: /etc/systemd/system/ceph-osd@.service.d
>> >> >
>> >> >└─90-ExecStart_NUMA.conf
>> >> >
>> >> >Active: active (running) since Fri 2019-11-08 10:32:51 UTC; 1min 1s 
>> >> > ago
>> >> >
>> >> >   Process: 219213 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh 
>> >> > --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)  Main PID: 
>> >> > 219218 (ceph-osd)
>> >> >
>> >> >CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
>> >> >
>> >> >└─219218 /usr/bin/ceph-osd

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-08 Thread huang jun

I saw many lines like that

mon.cn1@0(leader).osd e1805 preprocess_boot from osd.112
v2:10.50.11.45:6822/158344 clashes with existing osd: different fsid
(ours: 85908622-31bd-4728-9be3-f1f6ca44ed98 ; theirs:
127fdc44-c17e-42ee-bcd4-d577c0ef4479)
the osd boot will be ignored if the fsid mismatch
what do you do before this happen?

nokia ceph  于2019年11月8日周五 下午8:29写道：
>
> Hi,
>
> Please find the osd.0 which is restarted after the debug_mon is increased to 
> 20.
>
> cn1.chn8be1c1.cdn ~# date;systemctl restart ceph-osd@0.service
> Fri Nov  8 12:25:05 UTC 2019
>
> cn1.chn8be1c1.cdn ~# systemctl status ceph-osd@0.service -l
> ● ceph-osd@0.service - Ceph object storage daemon osd.0
>Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; 
> enabled-runtime; vendor preset: disabled)
>   Drop-In: /etc/systemd/system/ceph-osd@.service.d
>└─90-ExecStart_NUMA.conf
>Active: active (running) since Fri 2019-11-08 12:25:06 UTC; 29s ago
>   Process: 298505 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster 
> ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
>  Main PID: 298512 (ceph-osd)
>CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
>└─298512 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph 
> --setgroup ceph
>
> Nov 08 12:25:06 cn1.chn8be1c1.cdn systemd[1]: Starting Ceph object storage 
> daemon osd.0...
> Nov 08 12:25:06 cn1.chn8be1c1.cdn systemd[1]: Started Ceph object storage 
> daemon osd.0.
> Nov 08 12:25:11 cn1.chn8be1c1.cdn numactl[298512]: 2019-11-08 12:25:11.538 
> 7f8515323d80 -1 osd.0 1795 log_to_monitors {default=true}
> Nov 08 12:25:11 cn1.chn8be1c1.cdn numactl[298512]: 2019-11-08 12:25:11.689 
> 7f850792e700 -1 osd.0 1795 set_numa_affinity unable to identify public 
> interface 'dss-client' numa node: (2) No such file or directory
>
> On Fri, Nov 8, 2019 at 4:48 PM huang jun  wrote:
>>
>> the osd.0 is still in down state after restart? if so, maybe the
>> problem is in mon,
>> can you set the leader mon's debug_mon=20 and restart one of the down
>> state osd.
>> and then attach the mon log file.
>>
>> nokia ceph  于2019年11月8日周五 下午6:38写道：
>> >
>> > Hi,
>> >
>> >
>> >
>> > Below is the status of the OSD after restart.
>> >
>> >
>> >
>> > # systemctl status ceph-osd@0.service
>> >
>> > ● ceph-osd@0.service - Ceph object storage daemon osd.0
>> >
>> >Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; 
>> > enabled-runtime; vendor preset: disabled)
>> >
>> >   Drop-In: /etc/systemd/system/ceph-osd@.service.d
>> >
>> >└─90-ExecStart_NUMA.conf
>> >
>> >Active: active (running) since Fri 2019-11-08 10:32:51 UTC; 1min 1s ago
>> >
>> >   Process: 219213 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh 
>> > --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)  Main PID: 
>> > 219218 (ceph-osd)
>> >
>> >CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
>> >
>> >└─219218 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser 
>> > ceph --setgroup ceph
>> >
>> >
>> >
>> > Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Starting Ceph object storage 
>> > daemon osd.0...
>> >
>> > Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Started Ceph object storage 
>> > daemon osd.0.
>> >
>> > Nov 08 10:33:03 cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08 10:33:03.785 
>> > 7f9adeed4d80 -1 osd.0 1795 log_to_monitors {default=true} Nov 08 10:33:05 
>> > cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08 10:33:05.474 7f9ad14df700 -1 
>> > osd.0 1795 set_numa_affinity unable to identify public interface 
>> > 'dss-client' numa n...r directory
>> >
>> > Hint: Some lines were ellipsized, use -l to show in full.
>> >
>> >
>> >
>> >
>> >
>> > And I have attached the logs in the file in this mail while this restart 
>> > was initiated.
>> >
>> >
>> >
>> >
>> > On Fri, Nov 8, 2019 at 3:59 PM huang jun  wrote:
>> >>
>> >> try to restart some of the down osds in 'ceph osd tree', and to see
>> >> what happened?
>> >>
>> >> nokia ceph  于2019年11月8日周五 下午6:24写道：
>> >> >
>> >> > Adding my official mail id
>> >> >
>> >> > -- Forwarded message -
>> >> > From: nokia ceph 
>> >> > Date: Fri, Nov 8, 2

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-08 Thread huang jun

the osd.0 is still in down state after restart? if so, maybe the
problem is in mon,
can you set the leader mon's debug_mon=20 and restart one of the down
state osd.
and then attach the mon log file.

nokia ceph  于2019年11月8日周五 下午6:38写道：
>
> Hi,
>
>
>
> Below is the status of the OSD after restart.
>
>
>
> # systemctl status ceph-osd@0.service
>
> ● ceph-osd@0.service - Ceph object storage daemon osd.0
>
>Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; 
> enabled-runtime; vendor preset: disabled)
>
>   Drop-In: /etc/systemd/system/ceph-osd@.service.d
>
>└─90-ExecStart_NUMA.conf
>
>Active: active (running) since Fri 2019-11-08 10:32:51 UTC; 1min 1s ago
>
>   Process: 219213 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster 
> ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)  Main PID: 219218 
> (ceph-osd)
>
>CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
>
>└─219218 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph 
> --setgroup ceph
>
>
>
> Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Starting Ceph object storage 
> daemon osd.0...
>
> Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Started Ceph object storage 
> daemon osd.0.
>
> Nov 08 10:33:03 cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08 10:33:03.785 
> 7f9adeed4d80 -1 osd.0 1795 log_to_monitors {default=true} Nov 08 10:33:05 
> cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08 10:33:05.474 7f9ad14df700 -1 
> osd.0 1795 set_numa_affinity unable to identify public interface 'dss-client' 
> numa n...r directory
>
> Hint: Some lines were ellipsized, use -l to show in full.
>
>
>
>
>
> And I have attached the logs in the file in this mail while this restart was 
> initiated.
>
>
>
>
> On Fri, Nov 8, 2019 at 3:59 PM huang jun  wrote:
>>
>> try to restart some of the down osds in 'ceph osd tree', and to see
>> what happened?
>>
>> nokia ceph  于2019年11月8日周五 下午6:24写道：
>> >
>> > Adding my official mail id
>> >
>> > -- Forwarded message -
>> > From: nokia ceph 
>> > Date: Fri, Nov 8, 2019 at 3:57 PM
>> > Subject: OSD's not coming up in Nautilus
>> > To: Ceph Users 
>> >
>> >
>> > Hi Team,
>> >
>> > There is one 5 node ceph cluster which we have upgraded from Luminous to 
>> > Nautilus and everything was going well until yesterday when we noticed 
>> > that the ceph osd's are marked down and not recognized by the monitors as 
>> > running eventhough the osd processes are running.
>> >
>> > We noticed that the admin.keyring and the mon.keyring are missing in the 
>> > nodes which we have recreated it with the below commands.
>> >
>> > ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring 
>> > --gen-key -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap 
>> > mds allow
>> >
>> > ceph-authtool --create_keyring /etc/ceph/ceph.mon.keyring --gen-key -n 
>> > mon. --cap mon 'allow *'
>> >
>> > In logs we find the below lines.
>> >
>> > 2019-11-08 09:01:50.525 7ff61722b700  0 log_channel(audit) log [DBG] : 
>> > from='client.? 10.50.11.44:0/2398064782' entity='client.admin' 
>> > cmd=[{"prefix": "df", "format": "json"}]: dispatch
>> > 2019-11-08 09:02:37.686 7ff61722b700  0 log_channel(cluster) log [INF] : 
>> > mon.cn1 calling monitor election
>> > 2019-11-08 09:02:37.686 7ff61722b700  1 mon.cn1@0(electing).elector(31157) 
>> > init, last seen epoch 31157, mid-election, bumping
>> > 2019-11-08 09:02:37.688 7ff61722b700 -1 mon.cn1@0(electing) e3 failed to 
>> > get devid for : udev_device_new_from_subsystem_sysname failed on ''
>> > 2019-11-08 09:02:37.770 7ff61722b700  0 log_channel(cluster) log [INF] : 
>> > mon.cn1 is new leader, mons cn1,cn2,cn3,cn4,cn5 in quorum (ranks 0,1,2,3,4)
>> > 2019-11-08 09:02:37.857 7ff613a24700  0 log_channel(cluster) log [DBG] : 
>> > monmap e3: 5 mons at 
>> > {cn1=[v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0],cn2=[v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0],cn3=[v2:10.50.11.43:3300/0,v1:10.50.11.43:6789/0],cn4=[v2:10.50.11.44:3300/0,v1:10.50.11.44:6789/0],cn5=[v2:10.50.11.45:3300/0,v1:10.50.11.45:6789/0]}
>> >
>> >
>> >
>> > # ceph mon dump
>> > dumped monmap epoch 3
>> > epoch 3
>> > fsid 9dbf207a-561c-48ba-892d-3e79b86be12f
>> > last_changed 2019-09-03 07:53:39.031174
>> > created 2019-08-23 18:30:55.970279
>>

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-08 Thread huang jun

try to restart some of the down osds in 'ceph osd tree', and to see
what happened?

nokia ceph  于2019年11月8日周五 下午6:24写道：
>
> Adding my official mail id
>
> -- Forwarded message -
> From: nokia ceph 
> Date: Fri, Nov 8, 2019 at 3:57 PM
> Subject: OSD's not coming up in Nautilus
> To: Ceph Users 
>
>
> Hi Team,
>
> There is one 5 node ceph cluster which we have upgraded from Luminous to 
> Nautilus and everything was going well until yesterday when we noticed that 
> the ceph osd's are marked down and not recognized by the monitors as running 
> eventhough the osd processes are running.
>
> We noticed that the admin.keyring and the mon.keyring are missing in the 
> nodes which we have recreated it with the below commands.
>
> ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring --gen-key 
> -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds allow
>
> ceph-authtool --create_keyring /etc/ceph/ceph.mon.keyring --gen-key -n mon. 
> --cap mon 'allow *'
>
> In logs we find the below lines.
>
> 2019-11-08 09:01:50.525 7ff61722b700  0 log_channel(audit) log [DBG] : 
> from='client.? 10.50.11.44:0/2398064782' entity='client.admin' 
> cmd=[{"prefix": "df", "format": "json"}]: dispatch
> 2019-11-08 09:02:37.686 7ff61722b700  0 log_channel(cluster) log [INF] : 
> mon.cn1 calling monitor election
> 2019-11-08 09:02:37.686 7ff61722b700  1 mon.cn1@0(electing).elector(31157) 
> init, last seen epoch 31157, mid-election, bumping
> 2019-11-08 09:02:37.688 7ff61722b700 -1 mon.cn1@0(electing) e3 failed to get 
> devid for : udev_device_new_from_subsystem_sysname failed on ''
> 2019-11-08 09:02:37.770 7ff61722b700  0 log_channel(cluster) log [INF] : 
> mon.cn1 is new leader, mons cn1,cn2,cn3,cn4,cn5 in quorum (ranks 0,1,2,3,4)
> 2019-11-08 09:02:37.857 7ff613a24700  0 log_channel(cluster) log [DBG] : 
> monmap e3: 5 mons at 
> {cn1=[v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0],cn2=[v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0],cn3=[v2:10.50.11.43:3300/0,v1:10.50.11.43:6789/0],cn4=[v2:10.50.11.44:3300/0,v1:10.50.11.44:6789/0],cn5=[v2:10.50.11.45:3300/0,v1:10.50.11.45:6789/0]}
>
>
>
> # ceph mon dump
> dumped monmap epoch 3
> epoch 3
> fsid 9dbf207a-561c-48ba-892d-3e79b86be12f
> last_changed 2019-09-03 07:53:39.031174
> created 2019-08-23 18:30:55.970279
> min_mon_release 14 (nautilus)
> 0: [v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0] mon.cn1
> 1: [v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0] mon.cn2
> 2: [v2:10.50.11.43:3300/0,v1:10.50.11.43:6789/0] mon.cn3
> 3: [v2:10.50.11.44:3300/0,v1:10.50.11.44:6789/0] mon.cn4
> 4: [v2:10.50.11.45:3300/0,v1:10.50.11.45:6789/0] mon.cn5
>
>
> # ceph -s
>   cluster:
> id: 9dbf207a-561c-48ba-892d-3e79b86be12f
> health: HEALTH_WARN
> 85 osds down
> 3 hosts (72 osds) down
> 1 nearfull osd(s)
> 1 pool(s) nearfull
> Reduced data availability: 2048 pgs inactive
> too few PGs per OSD (17 < min 30)
> 1/5 mons down, quorum cn2,cn3,cn4,cn5
>
>   services:
> mon: 5 daemons, quorum cn2,cn3,cn4,cn5 (age 57s), out of quorum: cn1
> mgr: cn1(active, since 73m), standbys: cn2, cn3, cn4, cn5
> osd: 120 osds: 35 up, 120 in; 909 remapped pgs
>
>   data:
> pools:   1 pools, 2048 pgs
> objects: 0 objects, 0 B
> usage:   176 TiB used, 260 TiB / 437 TiB avail
> pgs: 100.000% pgs unknown
>  2048 unknown
>
>
> The osd logs show the below logs.
>
> 2019-11-08 09:05:33.332 7fd1a36eed80  0 _get_class not permitted to load kvs
> 2019-11-08 09:05:33.332 7fd1a36eed80  0 _get_class not permitted to load lua
> 2019-11-08 09:05:33.337 7fd1a36eed80  0 _get_class not permitted to load sdk
> 2019-11-08 09:05:33.337 7fd1a36eed80  0 osd.0 1795 crush map has features 
> 43262930805112, adjusting msgr requires for clients
> 2019-11-08 09:05:33.337 7fd1a36eed80  0 osd.0 1795 crush map has features 
> 43262930805112 was 8705, adjusting msgr requires for mons
> 2019-11-08 09:05:33.337 7fd1a36eed80  0 osd.0 1795 crush map has features 
> 1009090060360105984, adjusting msgr requires for osds
>
> Please let us know what might be the issue. There seems to be no network 
> issues in any of the servers public and private interfaces.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How RBD tcp connection works

2019-08-19 Thread huang jun

how long do you monitor after r/w finish?
there is a configure item named 'ms_connection_idle_timeout' which
default value is 900

fengyd  于2019年8月19日周一 下午4:10写道：
>
> Hi,
>
> I have a question about tcp connection.
> In the test environment, openstack uses ceph RBD as backend storage.
> I created a VM and attache a volume/image to the VM.
> I monitored how many fd was used by Qemu process.
> I used the command dd to fill the whole volume/image.
> I found that the FD count was increased, and stable at a fixed value after 
> some time.
>
> I think when reading/writing to volume/image, tcp connection needs to be 
> established which needs FD, then the FD count may increase.
> But after reading/writing, why the FD count doesn't descrease?
>
> Thanks in advance.
> BR.
> Yafeng
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph pool EC with overwrite enabled

2019-07-04 Thread huang jun

try: rbd create backup2/teste --size 5T --data-pool ec_pool

Fabio Abreu  于2019年7月5日周五 上午1:49写道：
>
> Hi Everybody,
>
> I have a doubt about the usability of rbd with EC pool , I tried to use this 
> in my CentOS lab but  I just receive some errors when I try create a rbd 
> image inside this pool.
>
> For luminous environment this feature is supported?
>
> http://docs.ceph.com/docs/mimic/rados/operations/erasure-code/#erasure-coding-with-overwrites
>
> ceph osd pool set ec_pool allow_ec_overwrites true
>
>
> This error bellow happened when I try to create the RBD image :
>
>
> [root@mon1 ceph-key]# rbd create backup2/teste --size 5T --data-pool backup2
>
> ...
>
> warning: line 9: 'osd_pool_default_crush_rule' in section 'global' redefined
>
> 2019-07-03 17:27:33.721593 7f12c3fff700 -1 librbd::image::CreateRequest: 
> 0x560f2f0db0a0 handle_add_image_to_directory: error adding image to 
> directory: (95) Operation not supported
>
> rbd: create error: (95) Operation not supported
>
>
> Regards,
> Fabio Abreu Reis
> http://fajlinux.com.br
> Tel : +55 21 98244-0161
> Skype : fabioabreureis
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to see the ldout log?

2019-06-17 Thread huang jun

you should add this to your ceph.conf
[client]
log file = /var/log/ceph/$name.$pid.log
debug client = 20

?? ??  于2019年6月18日周二 上午11:18写道：
>
> I am a student new to cephfs. I want see the ldout log in 
> /src/client/Client.cc (for example, ldout(cct, 20) << " no cap on " << 
> dn->inode->vino() << dendl;). Can anyone teach me? The /var/log/ceph dir on 
> the client is empty.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] strange osd beacon

2019-06-15 Thread huang jun

osd send osd beacons every 300s, and it's used to let mon know that
osd is alive,
for some cases, the osd don't have peers, ex, no pools created.

Rafał Wądołowski  于2019年6月14日周五 下午12:53写道：
>
> Hi,
>
> Is it normal that osd beacon could be without pgs? Like below. This
> drive contain data, but I cannot make him to run.
>
> Ceph v.12.2.4
>
>
>  {
> "description": "osd_beacon(pgs [] lec 857158 v869771)",
> "initiated_at": "2019-06-14 06:39:37.972795",
> "age": 189.310037,
> "duration": 189.453167,
> "type_data": {
> "events": [
> {
> "time": "2019-06-14 06:39:37.972795",
> "event": "initiated"
> },
> {
> "time": "2019-06-14 06:39:37.972954",
> "event": "mon:_ms_dispatch"
> },
> {
> "time": "2019-06-14 06:39:37.972956",
> "event": "mon:dispatch_op"
> },
> {
> "time": "2019-06-14 06:39:37.972956",
> "event": "psvc:dispatch"
> },
> {
> "time": "2019-06-14 06:39:37.972976",
> "event": "osdmap:preprocess_query"
> },
> {
> "time": "2019-06-14 06:39:37.972978",
> "event": "osdmap:preprocess_beacon"
> },
> {
> "time": "2019-06-14 06:39:37.972982",
> "event": "forward_request_leader"
> },
> {
> "time": "2019-06-14 06:39:37.973064",
> "event": "forwarded"
> }
> ],
> "info": {
> "seq": 22378,
> "src_is_mon": false,
> "source": "osd.1092 10.11.2.33:6842/159188",
> "forwarded_to_leader": true
> }
> }
> }
>
>
> Best Regards,
>
> Rafał Wądołowski
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] problem with degraded PG

2019-06-15 Thread huang jun

can you show us the output of 'ceph osd dump' and 'ceph health detail'?

Luk  于2019年6月14日周五 下午8:02写道：
>
> Hello,
>
> All kudos are going to friends from Wroclaw, PL :)
>
> It was as simple as typo...
>
> There  was osd added two times to crushmap due to (this commands where
> run  over  week  ago  -  didn't  have problem then, it showed up after
> replacing another osd - osd-7):
>
> ceph osd crush add osd.112 0.00 root=hdd
> ceph osd crush move osd.112 0.00 root=hdd rack=rack-a host=stor-a02
> ceph osd crush add osd.112 0.00 host=stor-a02
>
> and the ceph osd tree was like this:
> [root@ceph-mon-01 ~]# ceph osd tree
> ID   CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF
> -100   200.27496 root hdd
> -10167.64999 rack rack-a
>   -233.82500 host stor-a01
>0   hdd   7.27499 osd.0 up  1.0 1.0
>6   hdd   7.27499 osd.6 up  1.0 1.0
>   12   hdd   7.27499 osd.12up  1.0 1.0
>  108   hdd   4.0 osd.108   up  1.0 1.0
>  109   hdd   4.0 osd.109   up  1.0 1.0
>  110   hdd   4.0 osd.110   up  1.0 1.0
>   -733.82500 host stor-a02
>5   hdd   7.27499 osd.5 up  1.0 1.0
>9   hdd   7.27499 osd.9 up  1.0 1.0
>   15   hdd   7.27499 osd.15up  1.0 1.0
>  111   hdd   4.0 osd.111   up  1.0 1.0
>  112   hdd   4.0 osd.112   up  1.0 1.0
>  113   hdd   4.0 osd.113   up  1.0 1.0
> -10260.97498 rack rack-b
>   -327.14998 host stor-b01
>1   hdd   7.27499 osd.1 up  1.0 1.0
>7   hdd   0.5 osd.7 up  1.0 1.0
>   13   hdd   7.27499 osd.13up  1.0 1.0
>  114   hdd   4.0 osd.114   up  1.0 1.0
>  115   hdd   4.0 osd.115   up  1.0 1.0
>  116   hdd   4.0 osd.116   up  1.0 1.0
>   -433.82500 host stor-b02
>2   hdd   7.27499 osd.2 up  1.0 1.0
>   10   hdd   7.27499 osd.10up  1.0 1.0
>   16   hdd   7.27499 osd.16up  1.0 1.0
>  117   hdd   4.0 osd.117   up  1.0 1.0
>  118   hdd   4.0 osd.118   up  1.0 1.0
>  119   hdd   4.0 osd.119   up  1.0 1.0
> -10367.64999 rack rack-c
>   -633.82500 host stor-c01
>4   hdd   7.27499 osd.4 up  1.0 1.0
>8   hdd   7.27499 osd.8 up  1.0 1.0
>   14   hdd   7.27499 osd.14up  1.0 1.0
>  120   hdd   4.0 osd.120   up  1.0 1.0
>  121   hdd   4.0 osd.121   up  1.0 1.0
>  122   hdd   4.0 osd.122   up  1.0 1.0
>   -533.82500 host stor-c02
>3   hdd   7.27499 osd.3 up  1.0 1.0
>   11   hdd   7.27499 osd.11up  1.0 1.0
>   17   hdd   7.27499 osd.17up  1.0 1.0
>  123   hdd   4.0 osd.123   up  1.0 1.0
>  124   hdd   4.0 osd.124   up  1.0 1.0
>  125   hdd   4.0 osd.125   up  1.0 1.0
>  112   hdd   4.0 osd.112   up  1.0 1.0
>
>  [cut]
>
>  after  editing  crushmap  and removing osd.112 from root ceph started
>  recover and is healthy now :)
>
>  Regards
>  Lukasz
>
>
> > Here is ceph osd tree, in first post there is also ceph osd df tree:
>
> > https://pastebin.com/Vs75gpwZ
>
>
>
> >> Ahh I was thinking of chooseleaf_vary_r, which you already have.
> >> So probably not related to tunables. What is your `ceph osd tree` ?
>
> >> By the way, 12.2.9 has an unrelated bug (details
> >> http://tracker.ceph.com/issues/36686)
> >> AFAIU you will just need to update to v12.2.11 or v12.2.12 for that fix.
>
> >> -- Dan
>
> >> On Fri, Jun 14, 2019 at 11:29 AM Luk  wrote:
> >>>
> >>> Hi,
> >>>
> >>> here is the output:
> >>>
> >>> ceph osd crush show-tunables
> >>> {
> >>> "choose_local_tries": 0,
> >>> "choose_local_fallback_tries": 0,
> >>> "choose_total_tries": 100,
> >>> "chooseleaf_descend_once": 1,
> >>> "chooseleaf_vary_r": 1,
> >>> "chooseleaf_stable": 0,
> >>> "straw_calc_version": 1,
> >>> "allowed_bucket_algs": 22,
> >>> "profile": "unknown",
> >>> "optimal_tunables": 0,
> >>> "legacy_tunables": 0,
> >>> "minimum_required_version":

Re: [ceph-users] balancer module makes OSD distribution worse

2019-06-08 Thread huang jun

what's your 'ceph osd df tree' outputs？does the osd have the expected PGs?

Josh Haft  于2019年6月7日周五 下午9:23写道：
>
> 95% of usage is CephFS. Remaining is split between RGW and RBD.
>
> On Wed, Jun 5, 2019 at 3:05 PM Gregory Farnum  wrote:
> >
> > I think the mimic balancer doesn't include omap data when trying to
> > balance the cluster. (Because it doesn't get usable omap stats from
> > the cluster anyway; in Nautilus I think it does.) Are you using RGW or
> > CephFS?
> > -Greg
> >
> > On Wed, Jun 5, 2019 at 1:01 PM Josh Haft  wrote:
> > >
> > > Hi everyone,
> > >
> > > On my 13.2.5 cluster, I recently enabled the ceph balancer module in
> > > crush-compat mode. A couple manual 'eval' and 'execute' runs showed
> > > the score improving, so I set the following and enabled the auto
> > > balancer.
> > >
> > > mgr/balancer/crush_compat_metrics:bytes # from
> > > https://github.com/ceph/ceph/pull/20665
> > > mgr/balancer/max_misplaced:0.01
> > > mgr/balancer/mode:crush-compat
> > >
> > > Log messages from the mgr showed lower scores with each iteration, so
> > > I thought things were moving in the right direction.
> > >
> > > Initially my highest-utilized OSD was at 79% and MAXVAR was 1.17. I
> > > let the balancer do its thing for 5 days, at which point my highest
> > > utilized OSD was just over 90% and MAXVAR was about 1.28.
> > >
> > > I do have pretty low PG-per-OSD counts (average of about 60 - that's
> > > next on my list), but I explicitly asked the balancer to use the bytes
> > > metric. Was I just being impatient? Is it expected that usage would go
> > > up overall for a time before starting to trend downward? Is my low PG
> > > count affecting this somehow? I would have expected things to move in
> > > the opposite direction pretty quickly as they do with 'ceph osd
> > > reweight-by-utilization'.
> > >
> > > Thoughts?
> > >
> > > Regards,
> > > Josh
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Reweight OSD to 0, why doesn't report degraded if UP set under Pool Size

2019-06-08 Thread huang jun

i think the write data will also write to the osd.4 in this case.
bc your osd.4 is not down, so the ceph don't think the pg have some osd
down,
and it will replicated the data to all osds in actingbackfill set.

Tarek Zegar  于2019年6月7日周五 下午10:37写道：

> Paul / All
>
> I'm not sure what warning your are referring to, I'm on Nautilus. The
> point I'm getting at is if you weight out all OSD on a host with a cluster
> of 3 OSD hosts with 3 OSD each, crush rule = host, then write to the
> cluster, it *should* imo not just say remapped but undersized / degraded.
>
> See below, 1 out of the 3 OSD hosts has ALL it's OSD marked out and weight
> = 0. When you write (say using FIO), the PGs *only* have 2 OSD in them (UP
> set), which is pool min size. I don't understand why it's not saying
> undersized/degraded, this seems like a bug. Who cares that the Acting Set
> has the 3 original OSD in it, the actual data is only on 2 OSD, which is a
> degraded state
>
> *root@hostadmin:~# ceph -s*
> cluster:
> id: 33d41932-9df2-40ba-8e16-8dedaa4b3ef6
> health: HEALTH_WARN
> application not enabled on 1 pool(s)
>
> services:
> mon: 1 daemons, quorum hostmonitor1 (age 29m)
> mgr: hostmonitor1(active, since 31m)
> osd: 9 osds: 9 up, 6 in; 100 remapped pgs
>
> data:
> pools: 1 pools, 100 pgs
> objects: 520 objects, 2.0 GiB
> usage: 15 GiB used, 75 GiB / 90 GiB avail
> pgs: 520/1560 objects misplaced (33.333%)
> *100 active+clean+remapped*
>
> *root@hostadmin:~# ceph osd tree*
> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
> -1 0.08817 root default
> -3 0.02939 host hostosd1
> 0 hdd 0.00980 osd.0 up 1.0 1.0
> 3 hdd 0.00980 osd.3 up 1.0 1.0
> 6 hdd 0.00980 osd.6 up 1.0 1.0
> *-5 0.02939 host hostosd2*
> * 1 hdd 0.00980 osd.1 up 0 1.0*
> * 4 hdd 0.00980 osd.4 up 0 1.0*
> * 7 hdd 0.00980 osd.7 up 0 1.0*
> -7 0.02939 host hostosd3
> 2 hdd 0.00980 osd.2 up 1.0 1.0
> 5 hdd 0.00980 osd.5 up 1.0 1.0
> 8 hdd 0.00980 osd.8 up 1.0 1.0
>
>
> *root@hostadmin:~# ceph osd df*
> ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS
> STATUS
> 0 hdd 0.00980 1.0 10 GiB 1.7 GiB 765 MiB 12 KiB 1024 MiB 8.2 GiB 17.48
> 1.03 34 up
> 3 hdd 0.00980 1.0 10 GiB 1.7 GiB 765 MiB 12 KiB 1024 MiB 8.2 GiB 17.48
> 1.03 36 up
> 6 hdd 0.00980 1.0 10 GiB 1.6 GiB 593 MiB 4 KiB 1024 MiB 8.4 GiB 15.80
> 0.93 30 up
> * 1 hdd 0.00980 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 up*
> * 4 hdd 0.00980 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 up*
> * 7 hdd 0.00980 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 100 up*
> 2 hdd 0.00980 1.0 10 GiB 1.5 GiB 525 MiB 8 KiB 1024 MiB 8.5 GiB 15.13
> 0.89 20 up
> 5 hdd 0.00980 1.0 10 GiB 1.9 GiB 941 MiB 4 KiB 1024 MiB 8.1 GiB 19.20
> 1.13 43 up
> 8 hdd 0.00980 1.0 10 GiB 1.6 GiB 657 MiB 8 KiB 1024 MiB 8.4 GiB 16.42
> 0.97 37 up
> TOTAL 90 GiB 15 GiB 6.2 GiB 61 KiB 9.0 GiB 75 GiB 16.92
> MIN/MAX VAR: 0.89/1.13 STDDEV: 1.32
> Tarek Zegar
> Senior SDS Engineer
> Email *tze...@us.ibm.com* 
> Mobile *630.974.7172*
>
>
>
>
> [image: Inactive hide details for Paul Emmerich ---06/07/2019 05:25:23
> AM---remapped no longer triggers a health warning in nautilus. Y]Paul
> Emmerich ---06/07/2019 05:25:23 AM---remapped no longer triggers a health
> warning in nautilus. Your data is still there, it's just on the
>
> From: Paul Emmerich 
> To: Tarek Zegar 
> Cc: Ceph Users 
> Date: 06/07/2019 05:25 AM
> Subject: [EXTERNAL] Re: [ceph-users] Reweight OSD to 0, why doesn't
> report degraded if UP set under Pool Size
> --
>
>
>
> remapped no longer triggers a health warning in nautilus.
>
> Your data is still there, it's just on the wrong OSD if that OSD is still
> up and running.
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at *https://croit.io*
> 
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> *www.croit.io* 
> Tel: +49 89 1896585 90
>
>
> On Thu, Jun 6, 2019 at 10:48 PM Tarek Zegar <*tze...@us.ibm.com*
> > wrote:
>
>For testing purposes I set a bunch of OSD to 0 weight, this correctly
>forces Ceph to not use said OSD. I took enough out such that the UP set
>only had Pool min size # of OSD (i.e 2 OSD).
>
>Two Questions:
>1. Why doesn't the acting set eventually match the UP set and simply
>point to [6,5] only
>2. Why are none of the PGs marked as undersized and degraded? The data
>is only hosted on 2 OSD rather then Pool size (3), I would expect a
>undersized warning and degraded for PG with data?
>
>Example PG:
>PG 1.4d active+clean+remapped UP= [6,5] Acting = [6,5,4]
>
>OSD Tree:
>ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
>-1 0.08817 root default
>-3 0.02939 host hostosd1
>0 hdd 0.00980 osd.0 up 1.0 1.0
>3 hdd 0.00980 osd.3 up 1.0 1.0
>6 hdd 0.00980 osd.6 up 1.0 1.0
>-5 0.02939 host hostosd2
>1 hdd 0.00980 osd.1 up 0 1.0
>4 hdd 0.00980 osd.4 up 0

Re: [ceph-users] Can I limit OSD memory usage?

2019-06-08 Thread huang jun

Did you osd oom killed when cluster doing recover/backfill, or just
the client io?
The configure items you mentioned is for bluestore and the osd memory
include many other
things, like pglog, you it's important to known do you cluster is dong recover?

Sergei Genchev  于2019年6月8日周六 上午5:35写道：
>
>  Hi,
>  My OSD processes are constantly getting killed by OOM killer. My
> cluster has 5 servers, each with 18 spinning disks, running 18 OSD
> daemons in 48GB of memory.
>  I was trying to limit OSD cache, according to
> http://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/
>
> [osd]
> bluestore_cache_size_ssd = 1G
> bluestore_cache_size_hdd = 768M
>
> Yet, my OSDs are using way more memory than that. I have seen as high as 3.2G
>
> KiB Mem : 47877604 total,   310172 free, 45532752 used,  2034680 buff/cache
> KiB Swap:  2097148 total,0 free,  2097148 used.   950224 avail Mem
>
> PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
> COMMAND
>  352516 ceph  20   0 3962504   2.8g   4164 S   2.3  6.1   4:22.98
> ceph-osd
>  350771 ceph  20   0 3668248   2.7g   4724 S   3.0  6.0   3:56.76
> ceph-osd
>  352777 ceph  20   0 3659204   2.7g   4672 S   1.7  5.9   4:10.52
> ceph-osd
>  353578 ceph  20   0 3589484   2.6g   4808 S   4.6  5.8   3:37.54
> ceph-osd
>  352280 ceph  20   0 3577104   2.6g   4704 S   5.9  5.7   3:44.58
> ceph-osd
>  350933 ceph  20   0 3421168   2.5g   4140 S   2.6  5.4   3:38.13
> ceph-osd
>  353678 ceph  20   0 3368664   2.4g   4804 S   4.0  5.3  12:47.12
> ceph-osd
>  350665 ceph  20   0 3364780   2.4g   4716 S   2.6  5.3   4:23.44
> ceph-osd
>  353101 ceph  20   0 3304288   2.4g   4676 S   4.3  5.2   3:16.53
> ceph-osd
>  ...
>
>
>  Is there any way for me to limit how much memory does OSD use?
> Thank you!
>
> ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw dying

2019-06-08 Thread huang jun

From the error message, i'm decline to that 'mon_max_pg_per_osd' was exceed,
you can check the value of it, and its default value is 250, so you
can at most have 1500pgs(250*6osds),
and for replicated pools with size=3, you can have 500pgs for all pools,
you already have 448pgs, so the next pool can create at most 500-448=52pgs.

 于2019年6月8日周六 下午2:41写道：
>
> All;
>
> I have a test and demonstration cluster running (3 hosts, MON, MGR, 2x OSD 
> per host), and I'm trying to add a 4th host for gateway purposes.
>
> The radosgw process keeps dying with:
> 2019-06-07 15:59:50.700 7fc4ef273780  0 ceph version 14.2.1 
> (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable), process 
> radosgw, pid 17588
> 2019-06-07 15:59:51.358 7fc4ef273780  0 rgw_init_ioctx ERROR: 
> librados::Rados::pool_create returned (34) Numerical result out of range 
> (this can be due to a pool or placement group misconfiguration, e.g. pg_num < 
> pgp_num or mon_max_pg_per_osd exceeded)
> 2019-06-07 15:59:51.396 7fc4ef273780 -1 Couldn't init storage provider (RADOS)
>
> The .rgw.root pool already exists.
>
> ceph status returns:
>   cluster:
> id: 1a8a1693-fa54-4cb3-89d2-7951d4cee6a3
> health: HEALTH_OK
>
>   services:
> mon: 3 daemons, quorum S700028,S700029,S700030 (age 30m)
> mgr: S700028(active, since 47h), standbys: S700030, S700029
> osd: 6 osds: 6 up (since 2d), 6 in (since 3d)
>
>   data:
> pools:   5 pools, 448 pgs
> objects: 12 objects, 1.2 KiB
> usage:   722 GiB used, 65 TiB / 66 TiB avail
> pgs: 448 active+clean
>
> and ceph osd tree returns:
> ID CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
> -1   66.17697 root default
> -5   22.05899 host S700029
>  2   hdd 11.02950 osd.2up  1.0 1.0
>  3   hdd 11.02950 osd.3up  1.0 1.0
> -7   22.05899 host S700030
>  4   hdd 11.02950 osd.4up  1.0 1.0
>  5   hdd 11.02950 osd.5up  1.0 1.0
> -3   22.05899 host s700028
>  0   hdd 11.02950 osd.0up  1.0 1.0
>  1   hdd 11.02950 osd.1up  1.0 1.0
>
> Any thoughts on what I'm missing?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Repairing PG inconsistencies — Ceph Documentation - where's the text?

2019-05-17 Thread huang jun

ok, so i think if you use 'rados -p pool get
7:581d78de:::rbd_data.b48c7238e1f29.1b34:head -o obj'
the osd maybe got crashed.

Stuart Longland  于2019年5月18日周六 上午10:05写道：
>
> On 18/5/19 11:56 am, huang jun wrote:
> > That may have problem with your disk?
> > Do you check the syslog or demsg log,?
> > From the code, it will return 'read_error' only the read return EIO.
> > So i doubt that your disk have a sector error.
>
> It is possible, no errors are reported in `dmesg` though and `smartctl`
> does not report any read errors however the disks are getting on 3 years
> old now.
>
> I've got one of the other former OSD disks busy doing some self-tests
> now to see if that uncovers anything.
> --
> Stuart Longland (aka Redhatter, VK4MSL)
>
> I haven't lost my mind...
>   ...it's backed up on a tape somewhere.



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Repairing PG inconsistencies — Ceph Documentation - where's the text?

2019-05-17 Thread huang jun

That may have problem with your disk?
Do you check the syslog or demsg log,?
From the code, it will return 'read_error' only the read return EIO.
So i doubt that your disk have a sector error.

Stuart Longland  于2019年5月18日周六 上午9:43写道：
>
> On 18/5/19 11:34 am, huang jun wrote:
> > Stuart Longland  于2019年5月18日周六 上午9:26写道：
> >>
> >> On 16/5/19 8:55 pm, Stuart Longland wrote:
> >>> As this is Bluestore, it's not clear what I should do to resolve that,
> >>> so I thought I'd "RTFM" before asking here:
> >>> http://docs.ceph.com/docs/luminous/rados/operations/pg-repair/
> >>>
> >>> Maybe there's a secret hand-shake my web browser doesn't know about or
> >>> maybe the page is written in invisible ink, but that page appears blank
> >>> to me.
> >>
> >> Does anyone know why that page shows up blank?  I still have a placement
> >> group that is "inconsistent".  (A different one this time, but still!)
> >>
> > That maybe something wrong in ceph.com, it's a blank page for me.
>
> Ahh okay, so I'm not going crazy … yet. :-)
>
> >> Some pages I've researched suggest going to the OSD's mount-point and
> >> moving the offending object away, however Linux kernel 4.19.17 does not
> >> have a 'bluestore' driver, so I can't mount the file system to get at
> >> the offending object.
> >>
> >> Running `ceph pg repair ` tells me it has "instructed" the OSD to do
> >> a repair.  The OSD shows nothing at all in its logs even acknowledging
> >> the command, and the problem persists.  The only log messages I have of
> >> the issue are from yesterday:
> >>
> >>> 2019-05-17 05:59:53.170552 7f009b0be700 -1 log_channel(cluster) log [ERR] 
> >>> : 7.1a shard 3 soid 
> >>> 7:581d78de:::rbd_data.b48c7238e1f29.1b34:head : candidate had 
> >>> a read error
> >>> 2019-05-17 07:07:20.723999 7f009b0be700 -1 log_channel(cluster) log [ERR] 
> >>> : 7.1a shard 3 soid 
> >>> 7:5b335293:::rbd_data.8c9e1238e1f29.1438:head : candidate had 
> >>> a read error
> >>> 2019-05-17 07:29:16.537539 7f009b0be700 -1 log_channel(cluster) log [ERR] 
> >>> : 7.1a deep-scrub 0 missing, 2 inconsistent objects
> >>> 2019-05-17 07:29:16.537557 7f009b0be700 -1 log_channel(cluster) log [ERR] 
> >>> : 7.1a deep-scrub 2 errors
> >>
> >> … not from just now when I issued the command.  Why is my `ceph pg
> >> repair` command being ignored?
> > ceph pg repair will let pg do scrub and repair the inconsistent
> > do you still see this warning messages after 'pg repair'?
>
> Yes, I've been running `ceph pg repair 7.1a` repeatedly for the past 4
> hours.  No new log messages, and still `ceph health detail` shows this:
>
> > carbon ~ # ceph pg repair 7.1a
> > instructing pg 7.1a on osd.2 to repair
> > carbon ~ # ceph health detail
> > HEALTH_ERR 2 scrub errors; Possible data damage: 1 pg inconsistent
> > OSD_SCRUB_ERRORS 2 scrub errors
> > PG_DAMAGED Possible data damage: 1 pg inconsistent
> > pg 7.1a is active+clean+inconsistent, acting [2,3]
>
> I've also tried `ceph pg deep-scrub 7.1a` to no effect.
>
> I may shut the cluster down later to do some power infrastructure work
> (need to add a new power distribution box to power two new nodes) and
> possibly even install a new 48-port Ethernet switch but right now, I'd
> like to try and get my storage cluster back to health.
>
> Regards,
> --
> Stuart Longland (aka Redhatter, VK4MSL)
>
> I haven't lost my mind...
>   ...it's backed up on a tape somewhere.



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] openstack with ceph rbd vms IO/erros

2019-05-17 Thread huang jun

EDH - Manuel Rios Fernandez  于2019年5月17日周五 下午3:23写道：
>
> Did you check your KVM host RAM usage?
>
>
>
> We saw this on host very very loaded with overcommit in RAM causes a random 
> crash of VM.
>
>
>
> As you said for solve must be remounted externaly and fsck. You can prevent 
> it disabled ceph cache at Openstack Nova host. But your VM’s are going get 
> less performance.
>
>
>
> Whats you Ceph & Openstack version?
>
>
>
> Regards
>
>
>
>
>
> De: ceph-users  En nombre de ??
> Enviado el: viernes, 17 de mayo de 2019 9:01
> Para: ceph-users 
> Asunto: [ceph-users] openstack with ceph rbd vms IO/erros
>
>
>
> hi:
>
>  I hava a openstack cluster with a ceph cluster ，use rbd，ceph cluster use ssd 
>  pool tier.
>
>
>
> some vm on openstack sometimes crashed in two case .
>
>
>
> 1.  become readonly filesystem. after reboot ,it work fine again.
>
> 2.  IO errors . I must  repair the file system by fsck. thenreboot , it work 
> fine again.
>
>
>
> I do not know if this is ceph bugs or kvm bugs.
>
>
>
Do you have set 'osd_enable_op_tracker=true' and what the value of '
osd_op_complaint_time' ? If you set above 2 configure items, did you
cluster report 'slow request'?
> I need some ideas to resolv this ,Anyone can help me ?
>
> Look forward to your reply
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Repairing PG inconsistencies — Ceph Documentation - where's the text?

2019-05-17 Thread huang jun

Stuart Longland  于2019年5月18日周六 上午9:26写道：
>
> On 16/5/19 8:55 pm, Stuart Longland wrote:
> > As this is Bluestore, it's not clear what I should do to resolve that,
> > so I thought I'd "RTFM" before asking here:
> > http://docs.ceph.com/docs/luminous/rados/operations/pg-repair/
> >
> > Maybe there's a secret hand-shake my web browser doesn't know about or
> > maybe the page is written in invisible ink, but that page appears blank
> > to me.
>
> Does anyone know why that page shows up blank?  I still have a placement
> group that is "inconsistent".  (A different one this time, but still!)
>
That maybe something wrong in ceph.com, it's a blank page for me.
> Some pages I've researched suggest going to the OSD's mount-point and
> moving the offending object away, however Linux kernel 4.19.17 does not
> have a 'bluestore' driver, so I can't mount the file system to get at
> the offending object.
>
> Running `ceph pg repair ` tells me it has "instructed" the OSD to do
> a repair.  The OSD shows nothing at all in its logs even acknowledging
> the command, and the problem persists.  The only log messages I have of
> the issue are from yesterday:
>
> > 2019-05-17 05:59:53.170552 7f009b0be700 -1 log_channel(cluster) log [ERR] : 
> > 7.1a shard 3 soid 7:581d78de:::rbd_data.b48c7238e1f29.1b34:head 
> > : candidate had a read error
> > 2019-05-17 07:07:20.723999 7f009b0be700 -1 log_channel(cluster) log [ERR] : 
> > 7.1a shard 3 soid 7:5b335293:::rbd_data.8c9e1238e1f29.1438:head 
> > : candidate had a read error
> > 2019-05-17 07:29:16.537539 7f009b0be700 -1 log_channel(cluster) log [ERR] : 
> > 7.1a deep-scrub 0 missing, 2 inconsistent objects
> > 2019-05-17 07:29:16.537557 7f009b0be700 -1 log_channel(cluster) log [ERR] : 
> > 7.1a deep-scrub 2 errors
>
> … not from just now when I issued the command.  Why is my `ceph pg
> repair` command being ignored?
ceph pg repair will let pg do scrub and repair the inconsistent
do you still see this warning messages after 'pg repair'?
> --
> Stuart Longland (aka Redhatter, VK4MSL)
>
> I haven't lost my mind...
>   ...it's backed up on a tape somewhere.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Huge rebalance after rebooting OSD host (Mimic)

2019-05-15 Thread huang jun

do you have osd's crush location changed after reboot?

kas  于2019年5月15日周三 下午10:39写道：
>
> kas wrote:
> :   Marc,
> :
> : Marc Roos wrote:
> : : Are you sure your osd's are up and reachable? (run ceph osd tree on
> : : another node)
> :
> :   They are up, because all three mons see them as up.
> : However, ceph osd tree provided the hint (thanks!): The OSD host went back
> : with hostname "localhost" instead of the correct one for some reason.
> : So the OSDs moved themselves to a new HOST=localhost CRUSH node directly
> : under the CRUSH root. I rebooted the OSD host once again, and it went up
> : again with the correct hostname, and the "ceph osd tree" output looks sane
> : now. So I guess we have a reason for such a huge rebalance.
> :
> :   However, even though the OSD tree is back in the normal state,
> : the rebalance is still going on, and there are even inactive PGs,
> : with some Ceph clients being stuck seemingly forever:
> :
> : health: HEALTH_ERR
> : 1964645/3977451 objects misplaced (49.395%)
> : Reduced data availability: 11 pgs inactive
>
> Wild guessing what to do, I went to the rebooted OSD host and ran
> systemctl restart ceph-osd.target
> - restarting all OSD processes. The previously inactive (activating) pgs
> went to the active state, and Ceph clients got unstuck. Now I see
> HEALTH_ERR with backfill_toofull only, which I consider a normal state
> during Ceph Mimic rebalance.
>
> It would be interesting to know why some of the PGs went stuck,
> and why did restart help. FWIW, I have a "ceph pg query" output for
> one of the 11 inactive PGs.
>
> -Yenya
>
> ---
> # ceph pg 23.4f5 query
> {
> "state": "activating+remapped",
> "snap_trimq": "[]",
> "snap_trimq_len": 0,
> "epoch": 104015,
> "up": [
> 70,
> 72,
> 27
> ],
> "acting": [
> 25,
> 27,
> 79
> ],
> "backfill_targets": [
> "70",
> "72"
> ],
> "acting_recovery_backfill": [
> "25",
> "27",
> "70",
> "72",
> "79"
> ],
> "info": {
> "pgid": "23.4f5",
> "last_update": "103035'4667973",
> "last_complete": "103035'4667973",
> "log_tail": "102489'4664889",
> "last_user_version": 4667973,
> "last_backfill": "MAX",
> "last_backfill_bitwise": 1,
> "purged_snaps": [],
> "history": {
> "epoch_created": 406,
> "epoch_pool_created": 406,
> "last_epoch_started": 103086,
> "last_interval_started": 103085,
> "last_epoch_clean": 96881,
> "last_interval_clean": 96880,
> "last_epoch_split": 0,
> "last_epoch_marked_full": 0,
> "same_up_since": 103095,
> "same_interval_since": 103095,
> "same_primary_since": 95398,
> "last_scrub": "102517'4667556",
> "last_scrub_stamp": "2019-05-15 01:07:28.978979",
> "last_deep_scrub": "102491'4666011",
> "last_deep_scrub_stamp": "2019-05-08 07:20:08.253942",
> "last_clean_scrub_stamp": "2019-05-15 01:07:28.978979"
> },
> "stats": {
> "version": "103035'4667973",
> "reported_seq": "2116838",
> "reported_epoch": "104015",
> "state": "activating+remapped",
> "last_fresh": "2019-05-15 16:19:44.530005",
> "last_change": "2019-05-15 14:56:04.248887",
> "last_active": "2019-05-15 14:56:02.579506",
> "last_peered": "2019-05-15 14:56:01.401941",
> "last_clean": "2019-05-15 14:53:39.291350",
> "last_became_active": "2019-05-15 14:55:54.163102",
> "last_became_peered": "2019-05-15 14:55:54.163102",
> "last_unstale": "2019-05-15 16:19:44.530005",
> "last_undegraded": "2019-05-15 16:19:44.530005",
> "last_fullsized": "2019-05-15 16:19:44.530005",
> "mapping_epoch": 103095,
> "log_start": "102489'4664889",
> "ondisk_log_start": "102489'4664889",
> "created": 406,
> "last_epoch_clean": 96881,
> "parent": "0.0",
> "parent_split_bits": 0,
> "last_scrub": "102517'4667556",
> "last_scrub_stamp": "2019-05-15 01:07:28.978979",
> "last_deep_scrub": "102491'4666011",
> "last_deep_scrub_stamp": "2019-05-08 07:20:08.253942",
> "last_clean_scrub_stamp": "2019-05-15 01:07:28.978979",
> "log_size": 3084,
> "ondisk_log_size": 3084,
> "stats_invalid": false,
> "dirty_stats_invalid": false,
> "omap_stats_invalid": false,
> "hitset_stats_invalid": false,
> "hitset_bytes_stats_invalid": false,
> "pin_stats_invalid": true,
>

Re: [ceph-users] Does ceph osd reweight-by-xxx work correctly if OSDs aren't of same size?

2019-04-29 Thread huang jun

Yes, 'ceph osd reweight-by-xxx' will use the osd crush-weight(which
represent how much data it can hold)
 to calculate.

Igor Podlesny  于2019年4月29日周一 下午2:56写道：
>
> Say, some nodes have OSDs that are 1.5 times bigger, than other nodes
> have, meanwhile weights of all the nodes in question is almost equal
> (due having different number of OSDs obviously)
>
> --
> End of message. Next message?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] clock skew

2019-04-25 Thread huang jun

mj  于2019年4月25日周四 下午6:34写道：
>
> Hi all,
>
> On our three-node cluster, we have setup chrony for time sync, and even
> though chrony reports that it is synced to ntp time, at the same time
> ceph occasionally reports time skews that can last several hours.
>
> See for example:
>
> > root@ceph2:~# ceph -v
> > ceph version 12.2.10 (fc2b1783e3727b66315cc667af9d663d30fe7ed4) luminous 
> > (stable)
> > root@ceph2:~# ceph health detail
> > HEALTH_WARN clock skew detected on mon.1
> > MON_CLOCK_SKEW clock skew detected on mon.1
> > mon.1 addr 10.10.89.2:6789/0 clock skew 0.506374s > max 0.5s (latency 
> > 0.000591877s)
> > root@ceph2:~# chronyc tracking
> > Reference ID: 7F7F0101 ()
> > Stratum : 10
> > Ref time (UTC)  : Wed Apr 24 19:05:28 2019
> > System time : 0.00133 seconds slow of NTP time
> > Last offset : -0.00524 seconds
> > RMS offset  : 0.00524 seconds
> > Frequency   : 12.641 ppm slow
> > Residual freq   : +0.000 ppm
> > Skew: 0.000 ppm
> > Root delay  : 0.00 seconds
> > Root dispersion : 0.00 seconds
> > Update interval : 1.4 seconds
> > Leap status : Normal
> > root@ceph2:~#
>
> For the record: mon.1 = ceph2 = 10.10.89.2, and time is synced similarly
> with NTP on the two other nodes.
>
> We don't understand this...
>
> I have now injected mon_clock_drift_allowed 0.7, so at least we have
> HEALTH_OK again. (to stop upsetting my monitoring system)
>
> But two questions:
>
> - can anyone explain why this is happening, is it looks as if ceph and
> NTP/chrony disagree on just how time-synced the servers are..?

Not familiar with chrony, but for our practice is using NTP, and it works fine.

> - how to determine the current clock skew from cephs perspective?
> Because "ceph health detail" in case of HEALTH_OK does not show it.
> (I want to start monitoring it continuously, to see if I can find some
> sort of pattern)

You can use 'ceph time-sync-status' to get current time sync status.
>
> Thanks!
>
> MJ
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Update crushmap when monitors are down

2019-04-01 Thread huang jun

Can you provide detail error logs  when mon crash?

Pardhiv Karri  于2019年4月2日周二 上午9:02写道：
>
> Hi,
>
> Our ceph production cluster is down when updating crushmap. Now we can't get 
> out monitors to come online and when they come online for a fraction of a 
> second we see crush map errors in logs. How can we update crushmap when 
> monitors are down as none of the ceph commands are working.
>
> Thanks,
> Pardhiv Karri
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG stuck in active+clean+remapped

2019-03-31 Thread huang jun

seems like the crush cannot get enough osds for this pg,
what the output of 'ceph osd crush dump' and especially the 'tunables'
section values?

Vladimir Prokofev  于2019年3月27日周三 上午4:02写道：
>
> CEPH 12.2.11, pool size 3, min_size 2.
>
> One node went down today(private network interface started flapping, and 
> after a while OSD processes crashed), no big deal, cluster recovered, but not 
> completely. 1 PG stuck in active+clean+remapped state.
>
> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES   LOG 
>  DISK_LOG STATE STATE_STAMPVERSION 
> REPORTEDUP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB
>   SCRUB_STAMPLAST_DEEP_SCRUB DEEP_SCRUB_STAMP   
> SNAPTRIMQ_LEN
> 20.a2   511  00   511   0  1584410172 
> 1500 1500 active+clean+remapped 2019-03-26 20:50:18.639452
> 96149'18920496861:935872[26,14] 26  [26,14,9] 26  
>   96149'189204 2019-03-26 10:47:36.17476995989'187669 2019-03-22 
> 23:29:02.322848 0
>
> it states it's placed on 26,14 OSDs, should be on 26,14,9. As far as I can 
> see there's nothing wrong with any of those OSDs, they work, host other PGs, 
> peer with each other, etc. I tried restarting all of them one after another, 
> but without any success.
> OSD 9 hosts 95 other PGs, don't think it's PG overdose.
>
> Last line of log from osd.9 mentioning PG 20.a2:
> 2019-03-26 20:50:16.294500 7fe27963a700  1 osd.9 pg_epoch: 96860 pg[20.a2( v 
> 96149'189204 (95989'187645,96149'189204] local-lis/les=96857/96858 n=511 
> ec=39164/39164 lis/c 96857/96855 les/c/f 96858/96856/66611 96859/96860/96855) 
> [26,14]/[26,14,9] r=2 lpr=96860 pi=[96855,96860)/1 crt=96149'189204 lcod 0'0 
> remapped NOTIFY mbc={}] state: transitioning to Stray
>
> Nothing else out of ordinary, just usual scrubs/deep-scrubs notifications.
> Any ideas what it it can be, or any other steps to troubleshoot this?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] how to force backfill a pg in ceph jewel

2019-03-31 Thread huang jun

The force-recovery/backfill command was introduced in Luminous version
if i remember right

Nikhil R  于2019年3月31日周日 上午7:59写道：
>
> Team,
> Is there a way to force backfill a pg in ceph jewel. I know this is available 
> in mimic. Is it available in ceph jewel
> I tried ceph pg backfill   pg backfill   but no luck
>
> Any help would be appreciated as we have a prod issue.
> in.linkedin.com/in/nikhilravindra
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Erasure Pools.

2019-03-31 Thread huang jun

What's the output of 'ceph osd dump' and 'ceph osd crush dump' and
'ceph health detail'?

Andrew J. Hutton  于2019年3月30日周六 上午7:05写道：
>
> I have tried to create erasure pools for CephFS using the examples given
> at
> https://swamireddy.wordpress.com/2016/01/26/ceph-diff-between-erasure-and-replicated-pool-type/
> but this is resulting in some weird behaviour.  The only number in
> common is that when creating the metadata store; is this related?
>
> [ceph@thor ~]$ ceph -s
>cluster:
>  id: b688f541-9ad4-48fc-8060-803cb286fc38
>  health: HEALTH_WARN
>  Reduced data availability: 128 pgs inactive, 128 pgs incomplete
>
>services:
>  mon: 3 daemons, quorum thor,odin,loki
>  mgr: odin(active), standbys: loki, thor
>  mds: cephfs-1/1/1 up  {0=thor=up:active}, 1 up:standby
>  osd: 5 osds: 5 up, 5 in
>
>data:
>  pools:   2 pools, 256 pgs
>  objects: 21 objects, 2.19KiB
>  usage:   5.08GiB used, 7.73TiB / 7.73TiB avail
>  pgs: 50.000% pgs not active
>   128 creating+incomplete
>   128 active+clean
>
> Pretty sure these were the commands used.
>
> ceph osd pool create storage 1024 erasure ec-42-profile2
> ceph osd pool create storage 128 erasure ec-42-profile2
> ceph fs new cephfs storage_metadata storage
> ceph osd pool create storage_metadata 128
> ceph fs new cephfs storage_metadata storage
> ceph fs add_data_pool cephfs storage
> ceph osd pool set storage allow_ec_overwrites true
> ceph osd pool application enable storage cephfs
> fs add_data_pool default storage
> ceph fs add_data_pool cephfs storage
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9

2019-03-29 Thread huang jun

Nikhil R  于2019年3月29日周五 下午1:44写道：
>
> if i comment filestore_split_multiple = 72 filestore_merge_threshold = 480   
> in the ceph.conf wont ceph take the default value of 2 and 10 and we would be 
> in more splits and crashes?
>
Yes, that aimed to make it clear what results in the long start time,
leveldb compact or filestore split?
> in.linkedin.com/in/nikhilravindra
>
>
>
> On Fri, Mar 29, 2019 at 6:55 AM huang jun  wrote:
>>
>> It seems like the split settings result the problem,
>> what about comment out those settings then see it still used that long
>> time to restart?
>> As a fast search in code, these two
>> filestore_split_multiple = 72
>> filestore_merge_threshold = 480
>> doesn't support online change.
>>
>> Nikhil R  于2019年3月28日周四 下午6:33写道：
>> >
>> > Thanks huang for the reply.
>> > Its is the disk compaction taking more time
>> > the disk i/o is completely utilized upto 100%
>> > looks like both osd_compact_leveldb_on_mount = false & 
>> > leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9
>> > is there a way to turn off compaction?
>> >
>> > Also, the reason why we are restarting osd's is due to splitting and we 
>> > increased split multiple and merge_threshold.
>> > Is there a way we would inject it? Is osd restarts the only solution?
>> >
>> > Thanks In Advance
>> >
>> > in.linkedin.com/in/nikhilravindra
>> >
>> >
>> >
>> > On Thu, Mar 28, 2019 at 3:58 PM huang jun  wrote:
>> >>
>> >> Did the time really cost on db compact operation?
>> >> or you can turn on debug_osd=20 to see what happens,
>> >> what about the disk util during start?
>> >>
>> >> Nikhil R  于2019年3月28日周四 下午4:36写道：
>> >> >
>> >> > CEPH osd restarts are taking too long a time
>> >> > below is my ceph.conf
>> >> > [osd]
>> >> > osd_compact_leveldb_on_mount = false
>> >> > leveldb_compact_on_mount = false
>> >> > leveldb_cache_size=1073741824
>> >> > leveldb_compression = false
>> >> > osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k"
>> >> > osd_max_backfills = 1
>> >> > osd_recovery_max_active = 1
>> >> > osd_recovery_op_priority = 1
>> >> > filestore_split_multiple = 72
>> >> > filestore_merge_threshold = 480
>> >> > osd_max_scrubs = 1
>> >> > osd_scrub_begin_hour = 22
>> >> > osd_scrub_end_hour = 3
>> >> > osd_deep_scrub_interval = 2419200
>> >> > osd_scrub_sleep = 0.1
>> >> >
>> >> > looks like both osd_compact_leveldb_on_mount = false & 
>> >> > leveldb_compact_on_mount = false isnt working as expected on ceph 
>> >> > v10.2.9
>> >> >
>> >> > Any ideas on a fix would be appreciated asap
>> >> > in.linkedin.com/in/nikhilravindra
>> >> >
>> >> > ___
>> >> > ceph-users mailing list
>> >> > ceph-users@lists.ceph.com
>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>> >>
>> >>
>> >> --
>> >> Thank you!
>> >> HuangJun
>>
>>
>>
>> --
>> Thank you!
>> HuangJun



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9

2019-03-28 Thread huang jun

It seems like the split settings result the problem,
what about comment out those settings then see it still used that long
time to restart?
As a fast search in code, these two
filestore_split_multiple = 72
filestore_merge_threshold = 480
doesn't support online change.

Nikhil R  于2019年3月28日周四 下午6:33写道：
>
> Thanks huang for the reply.
> Its is the disk compaction taking more time
> the disk i/o is completely utilized upto 100%
> looks like both osd_compact_leveldb_on_mount = false & 
> leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9
> is there a way to turn off compaction?
>
> Also, the reason why we are restarting osd's is due to splitting and we 
> increased split multiple and merge_threshold.
> Is there a way we would inject it? Is osd restarts the only solution?
>
> Thanks In Advance
>
> in.linkedin.com/in/nikhilravindra
>
>
>
> On Thu, Mar 28, 2019 at 3:58 PM huang jun  wrote:
>>
>> Did the time really cost on db compact operation?
>> or you can turn on debug_osd=20 to see what happens,
>> what about the disk util during start?
>>
>> Nikhil R  于2019年3月28日周四 下午4:36写道：
>> >
>> > CEPH osd restarts are taking too long a time
>> > below is my ceph.conf
>> > [osd]
>> > osd_compact_leveldb_on_mount = false
>> > leveldb_compact_on_mount = false
>> > leveldb_cache_size=1073741824
>> > leveldb_compression = false
>> > osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k"
>> > osd_max_backfills = 1
>> > osd_recovery_max_active = 1
>> > osd_recovery_op_priority = 1
>> > filestore_split_multiple = 72
>> > filestore_merge_threshold = 480
>> > osd_max_scrubs = 1
>> > osd_scrub_begin_hour = 22
>> > osd_scrub_end_hour = 3
>> > osd_deep_scrub_interval = 2419200
>> > osd_scrub_sleep = 0.1
>> >
>> > looks like both osd_compact_leveldb_on_mount = false & 
>> > leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9
>> >
>> > Any ideas on a fix would be appreciated asap
>> > in.linkedin.com/in/nikhilravindra
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> --
>> Thank you!
>> HuangJun



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9

2019-03-28 Thread huang jun

Did the time really cost on db compact operation?
or you can turn on debug_osd=20 to see what happens,
what about the disk util during start?

Nikhil R  于2019年3月28日周四 下午4:36写道：
>
> CEPH osd restarts are taking too long a time
> below is my ceph.conf
> [osd]
> osd_compact_leveldb_on_mount = false
> leveldb_compact_on_mount = false
> leveldb_cache_size=1073741824
> leveldb_compression = false
> osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k"
> osd_max_backfills = 1
> osd_recovery_max_active = 1
> osd_recovery_op_priority = 1
> filestore_split_multiple = 72
> filestore_merge_threshold = 480
> osd_max_scrubs = 1
> osd_scrub_begin_hour = 22
> osd_scrub_end_hour = 3
> osd_deep_scrub_interval = 2419200
> osd_scrub_sleep = 0.1
>
> looks like both osd_compact_leveldb_on_mount = false & 
> leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9
>
> Any ideas on a fix would be appreciated asap
> in.linkedin.com/in/nikhilravindra
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs error

2019-03-18 Thread huang jun

Marc Roos  于2019年3月18日周一 上午5:46写道：
>
>
>
>
> 2019-03-17 21:59:58.296394 7f97cbbe6700  0 --
> 192.168.10.203:6800/1614422834 >> 192.168.10.43:0/1827964483
> conn(0x55ba9614d000 :6800 s=STATE_OPEN pgs=8 cs=1 l=0).fault server,
> going to standby
>
> What does this mean?
That means the connection is idle for some time, and the server goto
standby state unless the new packages come
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cluster is not stable

2019-03-14 Thread huang jun

You can try that commands, but maybe you need to find the root cause
why the current monmap contains no features at all, do you upgrade
cluster from luminous to mimic,
or it's a new cluster installed mimic?


Zhenshi Zhou  于2019年3月14日周四 下午2:37写道：
>
> Hi huang,
>
> It's a pre-production environment. If everything is fine, I'll use it for 
> production.
>
> My cluster is version mimic, should I set all features you listed in the 
> command?
>
> Thanks
>
> huang jun  于2019年3月14日周四 下午2:11写道：
>>
>> sorry, the script should be
>> for f in kraken luminous mimic osdmap-prune; do
>>   ceph mon feature set $f --yes-i-really-mean-it
>> done
>>
>> huang jun  于2019年3月14日周四 下午2:04写道：
>> >
>> > ok, if this is a **test environment**, you can try
>> > for f in 'kraken,luminous,mimic,osdmap-prune'; do
>> >   ceph mon feature set $f --yes-i-really-mean-it
>> > done
>> >
>> > If it is a production environment, you should eval the risk first, and
>> > maybe setup a test cluster to testing first.
>> >
>> > Zhenshi Zhou  于2019年3月14日周四 下午1:56写道：
>> > >
>> > > # ceph mon feature ls
>> > > all features
>> > > supported: [kraken,luminous,mimic,osdmap-prune]
>> > > persistent: [kraken,luminous,mimic,osdmap-prune]
>> > > on current monmap (epoch 2)
>> > > persistent: [none]
>> > > required: [none]
>> > >
>> > > huang jun  于2019年3月14日周四 下午1:50写道：
>> > >>
>> > >> what's the output of 'ceph mon feature ls'?
>> > >>
>> > >> from the code, maybe mon features not contain luminous
>> > >> 6263 void OSD::send_beacon(const ceph::coarse_mono_clock::time_point& 
>> > >> now)
>> > >>
>> > >>  6264 {
>> > >>
>> > >>  6265   const auto& monmap = monc->monmap;
>> > >>
>> > >>  6266   // send beacon to mon even if we are just connected, and the
>> > >> monmap is not
>> > >>
>> > >>  6267   // initialized yet by then.
>> > >>
>> > >>  6268   if (monmap.epoch > 0 &&
>> > >>
>> > >>  6269   monmap.get_required_features().contains_all(
>> > >>
>> > >>  6270 ceph::features::mon::FEATURE_LUMINOUS)) {
>> > >>
>> > >>  6271 dout(20) << __func__ << " sending" << dendl;
>> > >>
>> > >>  6272 MOSDBeacon* beacon = nullptr;
>> > >>
>> > >>  6273 {
>> > >>
>> > >>  6274   std::lock_guard l{min_last_epoch_clean_lock};
>> > >>
>> > >>  6275   beacon = new MOSDBeacon(osdmap->get_epoch(), 
>> > >> min_last_epoch_clean);
>> > >>
>> > >>  6276   std::swap(beacon->pgs, min_last_epoch_clean_pgs);
>> > >>
>> > >>  6277   last_sent_beacon = now;
>> > >>
>> > >>  6278 }
>> > >>
>> > >>  6279 monc->send_mon_message(beacon);
>> > >>
>> > >>  6280   } else {
>> > >>
>> > >>  6281 dout(20) << __func__ << " not sending" << dendl;
>> > >>
>> > >>  6282   }
>> > >>
>> > >>  6283 }
>> > >>
>> > >>
>> > >> Zhenshi Zhou  于2019年3月14日周四 下午12:43写道：
>> > >> >
>> > >> > Hi,
>> > >> >
>> > >> > One of the log says the beacon not sending as below:
>> > >> > 2019-03-14 12:41:15.722 7f3c27684700 10 osd.5 17032 
>> > >> > tick_without_osd_lock
>> > >> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 
>> > >> > can_inc_scrubs_pending 0 -> 1 (max 1, active 0)
>> > >> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 scrub_time_permit 
>> > >> > should run between 0 - 24 now 12 = yes
>> > >> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 
>> > >> > scrub_load_below_threshold loadavg per cpu 0 < max 0.5 = yes
>> > >> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 sched_scrub 
>> > >> > load_is_low=1
>> > >> > 2019-03-14 12:41:15.722 7f3c27684700 10 osd.5 17032 sched_scrub 1.79 
>> > >>

Re: [ceph-users] cluster is not stable

2019-03-14 Thread huang jun

sorry, the script should be
for f in kraken luminous mimic osdmap-prune; do
  ceph mon feature set $f --yes-i-really-mean-it
done

huang jun  于2019年3月14日周四 下午2:04写道：
>
> ok, if this is a **test environment**, you can try
> for f in 'kraken,luminous,mimic,osdmap-prune'; do
>   ceph mon feature set $f --yes-i-really-mean-it
> done
>
> If it is a production environment, you should eval the risk first, and
> maybe setup a test cluster to testing first.
>
> Zhenshi Zhou  于2019年3月14日周四 下午1:56写道：
> >
> > # ceph mon feature ls
> > all features
> > supported: [kraken,luminous,mimic,osdmap-prune]
> > persistent: [kraken,luminous,mimic,osdmap-prune]
> > on current monmap (epoch 2)
> >     persistent: [none]
> > required: [none]
> >
> > huang jun  于2019年3月14日周四 下午1:50写道：
> >>
> >> what's the output of 'ceph mon feature ls'?
> >>
> >> from the code, maybe mon features not contain luminous
> >> 6263 void OSD::send_beacon(const ceph::coarse_mono_clock::time_point& now)
> >>
> >>  6264 {
> >>
> >>  6265   const auto& monmap = monc->monmap;
> >>
> >>  6266   // send beacon to mon even if we are just connected, and the
> >> monmap is not
> >>
> >>  6267   // initialized yet by then.
> >>
> >>  6268   if (monmap.epoch > 0 &&
> >>
> >>  6269   monmap.get_required_features().contains_all(
> >>
> >>  6270 ceph::features::mon::FEATURE_LUMINOUS)) {
> >>
> >>  6271 dout(20) << __func__ << " sending" << dendl;
> >>
> >>  6272 MOSDBeacon* beacon = nullptr;
> >>
> >>  6273 {
> >>
> >>  6274   std::lock_guard l{min_last_epoch_clean_lock};
> >>
> >>  6275   beacon = new MOSDBeacon(osdmap->get_epoch(), 
> >> min_last_epoch_clean);
> >>
> >>  6276   std::swap(beacon->pgs, min_last_epoch_clean_pgs);
> >>
> >>  6277   last_sent_beacon = now;
> >>
> >>  6278 }
> >>
> >>  6279 monc->send_mon_message(beacon);
> >>
> >>  6280   } else {
> >>
> >>  6281 dout(20) << __func__ << " not sending" << dendl;
> >>
> >>  6282   }
> >>
> >>  6283 }
> >>
> >>
> >> Zhenshi Zhou  于2019年3月14日周四 下午12:43写道：
> >> >
> >> > Hi,
> >> >
> >> > One of the log says the beacon not sending as below:
> >> > 2019-03-14 12:41:15.722 7f3c27684700 10 osd.5 17032 tick_without_osd_lock
> >> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 
> >> > can_inc_scrubs_pending 0 -> 1 (max 1, active 0)
> >> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 scrub_time_permit 
> >> > should run between 0 - 24 now 12 = yes
> >> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 
> >> > scrub_load_below_threshold loadavg per cpu 0 < max 0.5 = yes
> >> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 sched_scrub 
> >> > load_is_low=1
> >> > 2019-03-14 12:41:15.722 7f3c27684700 10 osd.5 17032 sched_scrub 1.79 
> >> > scheduled at 2019-03-14 13:17:51.290050 > 2019-03-14 12:41:15.723848
> >> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 sched_scrub done
> >> > 2019-03-14 12:41:15.722 7f3c27684700 10 osd.5 17032 
> >> > promote_throttle_recalibrate 0 attempts, promoted 0 objects and 0 B; 
> >> > target 25 obj/sec or 5 MiB/sec
> >> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 
> >> > promote_throttle_recalibrate  new_prob 1000
> >> > 2019-03-14 12:41:15.722 7f3c27684700 10 osd.5 17032 
> >> > promote_throttle_recalibrate  actual 0, actual/prob ratio 1, adjusted 
> >> > new_prob 1000, prob 1000 -> 1000
> >> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 send_beacon not 
> >> > sending
> >> >
> >> >
> >> > huang jun  于2019年3月14日周四 下午12:30写道：
> >> >>
> >> >> osd will not send beacons to mon if its not in ACTIVE state,
> >> >> so you maybe turn on one osd's debug_osd=20 to see what is going on
> >> >>
> >> >> Zhenshi Zhou  于2019年3月14日周四 上午11:07写道：
> >> >> >
> >> >> > What's more, I find that the osds don't send beacons all the time, 
> >> >> > some osds send beacons
> >> &g

Re: [ceph-users] cluster is not stable

2019-03-14 Thread huang jun

ok, if this is a **test environment**, you can try
for f in 'kraken,luminous,mimic,osdmap-prune'; do
  ceph mon feature set $f --yes-i-really-mean-it
done

If it is a production environment, you should eval the risk first, and
maybe setup a test cluster to testing first.

Zhenshi Zhou  于2019年3月14日周四 下午1:56写道：
>
> # ceph mon feature ls
> all features
> supported: [kraken,luminous,mimic,osdmap-prune]
> persistent: [kraken,luminous,mimic,osdmap-prune]
> on current monmap (epoch 2)
> persistent: [none]
>     required: [none]
>
> huang jun  于2019年3月14日周四 下午1:50写道：
>>
>> what's the output of 'ceph mon feature ls'?
>>
>> from the code, maybe mon features not contain luminous
>> 6263 void OSD::send_beacon(const ceph::coarse_mono_clock::time_point& now)
>>
>>  6264 {
>>
>>  6265   const auto& monmap = monc->monmap;
>>
>>  6266   // send beacon to mon even if we are just connected, and the
>> monmap is not
>>
>>  6267   // initialized yet by then.
>>
>>  6268   if (monmap.epoch > 0 &&
>>
>>  6269   monmap.get_required_features().contains_all(
>>
>>  6270 ceph::features::mon::FEATURE_LUMINOUS)) {
>>
>>  6271 dout(20) << __func__ << " sending" << dendl;
>>
>>  6272 MOSDBeacon* beacon = nullptr;
>>
>>  6273 {
>>
>>  6274   std::lock_guard l{min_last_epoch_clean_lock};
>>
>>  6275   beacon = new MOSDBeacon(osdmap->get_epoch(), 
>> min_last_epoch_clean);
>>
>>  6276   std::swap(beacon->pgs, min_last_epoch_clean_pgs);
>>
>>  6277   last_sent_beacon = now;
>>
>>  6278 }
>>
>>  6279 monc->send_mon_message(beacon);
>>
>>  6280   } else {
>>
>>  6281 dout(20) << __func__ << " not sending" << dendl;
>>
>>  6282   }
>>
>>  6283 }
>>
>>
>> Zhenshi Zhou  于2019年3月14日周四 下午12:43写道：
>> >
>> > Hi,
>> >
>> > One of the log says the beacon not sending as below:
>> > 2019-03-14 12:41:15.722 7f3c27684700 10 osd.5 17032 tick_without_osd_lock
>> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 can_inc_scrubs_pending 
>> > 0 -> 1 (max 1, active 0)
>> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 scrub_time_permit 
>> > should run between 0 - 24 now 12 = yes
>> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 
>> > scrub_load_below_threshold loadavg per cpu 0 < max 0.5 = yes
>> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 sched_scrub 
>> > load_is_low=1
>> > 2019-03-14 12:41:15.722 7f3c27684700 10 osd.5 17032 sched_scrub 1.79 
>> > scheduled at 2019-03-14 13:17:51.290050 > 2019-03-14 12:41:15.723848
>> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 sched_scrub done
>> > 2019-03-14 12:41:15.722 7f3c27684700 10 osd.5 17032 
>> > promote_throttle_recalibrate 0 attempts, promoted 0 objects and 0 B; 
>> > target 25 obj/sec or 5 MiB/sec
>> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 
>> > promote_throttle_recalibrate  new_prob 1000
>> > 2019-03-14 12:41:15.722 7f3c27684700 10 osd.5 17032 
>> > promote_throttle_recalibrate  actual 0, actual/prob ratio 1, adjusted 
>> > new_prob 1000, prob 1000 -> 1000
>> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 send_beacon not sending
>> >
>> >
>> > huang jun  于2019年3月14日周四 下午12:30写道：
>> >>
>> >> osd will not send beacons to mon if its not in ACTIVE state,
>> >> so you maybe turn on one osd's debug_osd=20 to see what is going on
>> >>
>> >> Zhenshi Zhou  于2019年3月14日周四 上午11:07写道：
>> >> >
>> >> > What's more, I find that the osds don't send beacons all the time, some 
>> >> > osds send beacons
>> >> > for a period of time and then stop sending beacons.
>> >> >
>> >> >
>> >> >
>> >> > Zhenshi Zhou  于2019年3月14日周四 上午10:57写道：
>> >> >>
>> >> >> Hi
>> >> >>
>> >> >> I set the config on every osd and check whether all osds send beacons
>> >> >> to monitors.
>> >> >>
>> >> >> The result shows that only part of the osds send beacons and the 
>> >> >> monitor
>> >> >> receives all beacons from which the osd send out.
>> >> >>
>> >> >> But why some osds

Re: [ceph-users] cluster is not stable

2019-03-13 Thread huang jun

what's the output of 'ceph mon feature ls'?

from the code, maybe mon features not contain luminous
6263 void OSD::send_beacon(const ceph::coarse_mono_clock::time_point& now)

 6264 {

 6265   const auto& monmap = monc->monmap;

 6266   // send beacon to mon even if we are just connected, and the
monmap is not

 6267   // initialized yet by then.

 6268   if (monmap.epoch > 0 &&

 6269   monmap.get_required_features().contains_all(

 6270 ceph::features::mon::FEATURE_LUMINOUS)) {

 6271 dout(20) << __func__ << " sending" << dendl;

 6272 MOSDBeacon* beacon = nullptr;

 6273 {

 6274   std::lock_guard l{min_last_epoch_clean_lock};

 6275   beacon = new MOSDBeacon(osdmap->get_epoch(), min_last_epoch_clean);

 6276   std::swap(beacon->pgs, min_last_epoch_clean_pgs);

 6277   last_sent_beacon = now;

 6278 }

 6279 monc->send_mon_message(beacon);

 6280   } else {

 6281 dout(20) << __func__ << " not sending" << dendl;

 6282   }

 6283 }


Zhenshi Zhou  于2019年3月14日周四 下午12:43写道：
>
> Hi,
>
> One of the log says the beacon not sending as below:
> 2019-03-14 12:41:15.722 7f3c27684700 10 osd.5 17032 tick_without_osd_lock
> 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 can_inc_scrubs_pending 0 
> -> 1 (max 1, active 0)
> 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 scrub_time_permit should 
> run between 0 - 24 now 12 = yes
> 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 
> scrub_load_below_threshold loadavg per cpu 0 < max 0.5 = yes
> 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 sched_scrub load_is_low=1
> 2019-03-14 12:41:15.722 7f3c27684700 10 osd.5 17032 sched_scrub 1.79 
> scheduled at 2019-03-14 13:17:51.290050 > 2019-03-14 12:41:15.723848
> 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 sched_scrub done
> 2019-03-14 12:41:15.722 7f3c27684700 10 osd.5 17032 
> promote_throttle_recalibrate 0 attempts, promoted 0 objects and 0 B; target 
> 25 obj/sec or 5 MiB/sec
> 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 
> promote_throttle_recalibrate  new_prob 1000
> 2019-03-14 12:41:15.722 7f3c27684700 10 osd.5 17032 
> promote_throttle_recalibrate  actual 0, actual/prob ratio 1, adjusted 
> new_prob 1000, prob 1000 -> 1000
> 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 send_beacon not sending
>
>
> huang jun  于2019年3月14日周四 下午12:30写道：
>>
>> osd will not send beacons to mon if its not in ACTIVE state,
>> so you maybe turn on one osd's debug_osd=20 to see what is going on
>>
>> Zhenshi Zhou  于2019年3月14日周四 上午11:07写道：
>> >
>> > What's more, I find that the osds don't send beacons all the time, some 
>> > osds send beacons
>> > for a period of time and then stop sending beacons.
>> >
>> >
>> >
>> > Zhenshi Zhou  于2019年3月14日周四 上午10:57写道：
>> >>
>> >> Hi
>> >>
>> >> I set the config on every osd and check whether all osds send beacons
>> >> to monitors.
>> >>
>> >> The result shows that only part of the osds send beacons and the monitor
>> >> receives all beacons from which the osd send out.
>> >>
>> >> But why some osds don't send beacon?
>> >>
>> >> huang jun  于2019年3月13日周三 下午11:02写道：
>> >>>
>> >>> sorry for not make it clearly, you may need to set one of your osd's
>> >>> osd_beacon_report_interval = 5
>> >>> and debug_ms=1 and then restart the osd process, then check the osd
>> >>> log by 'grep beacon /var/log/ceph/ceph-osd.$id.log'
>> >>> to make sure osd send beacons to mon, if osd send beacon to mon, you
>> >>> should also turn on debug_ms=1 on leader mon,
>> >>> and restart mon process, then check the mon log to make sure mon
>> >>> received osd beacon;
>> >>>
>> >>> Zhenshi Zhou  于2019年3月13日周三 下午8:20写道：
>> >>> >
>> >>> > And now, new errors are cliaming..
>> >>> >
>> >>> >
>> >>> > Zhenshi Zhou  于2019年3月13日周三 下午2:58写道：
>> >>> >>
>> >>> >> Hi,
>> >>> >>
>> >>> >> I didn't set  osd_beacon_report_interval as it must be the default 
>> >>> >> value.
>> >>> >> I have set osd_beacon_report_interval to 60 and debug_mon to 10.
>> >>> >>
>> >>> >> Attachment is the leader monitor log, the "mark-down" operations is 
>> >>> >> at 14:22
>> >>> >>
>

Re: [ceph-users] cluster is not stable

2019-03-13 Thread huang jun

osd will not send beacons to mon if its not in ACTIVE state,
so you maybe turn on one osd's debug_osd=20 to see what is going on

Zhenshi Zhou  于2019年3月14日周四 上午11:07写道：
>
> What's more, I find that the osds don't send beacons all the time, some osds 
> send beacons
> for a period of time and then stop sending beacons.
>
>
>
> Zhenshi Zhou  于2019年3月14日周四 上午10:57写道：
>>
>> Hi
>>
>> I set the config on every osd and check whether all osds send beacons
>> to monitors.
>>
>> The result shows that only part of the osds send beacons and the monitor
>> receives all beacons from which the osd send out.
>>
>> But why some osds don't send beacon?
>>
>> huang jun  于2019年3月13日周三 下午11:02写道：
>>>
>>> sorry for not make it clearly, you may need to set one of your osd's
>>> osd_beacon_report_interval = 5
>>> and debug_ms=1 and then restart the osd process, then check the osd
>>> log by 'grep beacon /var/log/ceph/ceph-osd.$id.log'
>>> to make sure osd send beacons to mon, if osd send beacon to mon, you
>>> should also turn on debug_ms=1 on leader mon,
>>> and restart mon process, then check the mon log to make sure mon
>>> received osd beacon;
>>>
>>> Zhenshi Zhou  于2019年3月13日周三 下午8:20写道：
>>> >
>>> > And now, new errors are cliaming..
>>> >
>>> >
>>> > Zhenshi Zhou  于2019年3月13日周三 下午2:58写道：
>>> >>
>>> >> Hi,
>>> >>
>>> >> I didn't set  osd_beacon_report_interval as it must be the default value.
>>> >> I have set osd_beacon_report_interval to 60 and debug_mon to 10.
>>> >>
>>> >> Attachment is the leader monitor log, the "mark-down" operations is at 
>>> >> 14:22
>>> >>
>>> >> Thanks
>>> >>
>>> >> huang jun  于2019年3月13日周三 下午2:07写道：
>>> >>>
>>> >>> can you get the value of osd_beacon_report_interval item? the default
>>> >>> is 300, you can set to 60,  or maybe turn on debug_ms=1 debug_mon=10
>>> >>> can get more infos.
>>> >>>
>>> >>>
>>> >>> Zhenshi Zhou  于2019年3月13日周三 下午1:20写道：
>>> >>> >
>>> >>> > Hi,
>>> >>> >
>>> >>> > The servers are cennected to the same switch.
>>> >>> > I can ping from anyone of the servers to other servers
>>> >>> > without a packet lost and the average round trip time
>>> >>> > is under 0.1 ms.
>>> >>> >
>>> >>> > Thanks
>>> >>> >
>>> >>> > Ashley Merrick  于2019年3月13日周三 下午12:06写道：
>>> >>> >>
>>> >>> >> Can you ping all your OSD servers from all your mons, and ping your 
>>> >>> >> mons from all your OSD servers?
>>> >>> >>
>>> >>> >> I’ve seen this where a route wasn’t working one direction, so it 
>>> >>> >> made OSDs flap when it used that mon to check availability:
>>> >>> >>
>>> >>> >> On Wed, 13 Mar 2019 at 11:50 AM, Zhenshi Zhou  
>>> >>> >> wrote:
>>> >>> >>>
>>> >>> >>> After checking the network and syslog/dmsg, I think it's not the 
>>> >>> >>> network or hardware issue. Now there're some
>>> >>> >>> osds being marked down every 15 minutes.
>>> >>> >>>
>>> >>> >>> here is ceph.log:
>>> >>> >>> 2019-03-13 11:06:26.290701 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 
>>> >>> >>> 6756 : cluster [INF] Cluster is now healthy
>>> >>> >>> 2019-03-13 11:21:21.705787 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 
>>> >>> >>> 6757 : cluster [INF] osd.1 marked down after no beacon for 
>>> >>> >>> 900.067020 seconds
>>> >>> >>> 2019-03-13 11:21:21.705858 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 
>>> >>> >>> 6758 : cluster [INF] osd.2 marked down after no beacon for 
>>> >>> >>> 900.067020 seconds
>>> >>> >>> 2019-03-13 11:21:21.705920 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 
>>> >>> >>> 6759 : cluster [INF] osd.4 marked down after no beacon for 
>>> >>> >>> 900.067020 seco

Re: [ceph-users] recommendation on ceph pool

2019-03-13 Thread huang jun

tim taler  于2019年3月13日周三 下午11:05写道：
>
> Hi all,
> how are your experiences with different disk sizes in one pool
> regarding the overall performance?
> I hope someone could shed some light on the following scenario:
>
> Let's say I mix an equal amount of 2TB and 8TB disks in one pool,
> with a crush map that tries to fill all disks to the same percentage.
>
> Assuming that all disks have roughly the same speed of let's say 100MB/s,
> wouldn't that hurt the performance?
>
> As a thought experiment let's say the pool consists of only two disks,
> one 2GB, one 8GB disk - both at 100MB/s
>
> If I put a 1GB file onto it that would would write
> 250MB to the small disk and
> 750MB to the big disk.
>
> leading to an overall write time of 7,5 sec.
>
> If my pool would consist of disks with the same size,
> than on both disks 500MB would be written,
> leading to an estimated time of only 5sec.
>
> Am I right here - in principle, not in exact numbers - or am I missing some
> hidden magic ('cause even the cache operations would take different
> times for different disk sizes, right?)
If your pool size is 1, and you have 2 disks with 2GB and 8GB, then
you write 1GB file,
so the 2GB disk will use 200MB and 8GB disk will use 800MB, crush
think the bigger osd will
distribute more data than small osd that controlled by crush weight;
So the best practice is deploy same capacity and performance disk in
one pool, and if you
have more than one hosts, you best have same osds in each host.
> TIA
> and best regards
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cluster is not stable

2019-03-13 Thread huang jun

sorry for not make it clearly, you may need to set one of your osd's
osd_beacon_report_interval = 5
and debug_ms=1 and then restart the osd process, then check the osd
log by 'grep beacon /var/log/ceph/ceph-osd.$id.log'
to make sure osd send beacons to mon, if osd send beacon to mon, you
should also turn on debug_ms=1 on leader mon,
and restart mon process, then check the mon log to make sure mon
received osd beacon;

Zhenshi Zhou  于2019年3月13日周三 下午8:20写道：
>
> And now, new errors are cliaming..
>
>
> Zhenshi Zhou  于2019年3月13日周三 下午2:58写道：
>>
>> Hi,
>>
>> I didn't set  osd_beacon_report_interval as it must be the default value.
>> I have set osd_beacon_report_interval to 60 and debug_mon to 10.
>>
>> Attachment is the leader monitor log, the "mark-down" operations is at 14:22
>>
>> Thanks
>>
>> huang jun  于2019年3月13日周三 下午2:07写道：
>>>
>>> can you get the value of osd_beacon_report_interval item? the default
>>> is 300, you can set to 60,  or maybe turn on debug_ms=1 debug_mon=10
>>> can get more infos.
>>>
>>>
>>> Zhenshi Zhou  于2019年3月13日周三 下午1:20写道：
>>> >
>>> > Hi,
>>> >
>>> > The servers are cennected to the same switch.
>>> > I can ping from anyone of the servers to other servers
>>> > without a packet lost and the average round trip time
>>> > is under 0.1 ms.
>>> >
>>> > Thanks
>>> >
>>> > Ashley Merrick  于2019年3月13日周三 下午12:06写道：
>>> >>
>>> >> Can you ping all your OSD servers from all your mons, and ping your mons 
>>> >> from all your OSD servers?
>>> >>
>>> >> I’ve seen this where a route wasn’t working one direction, so it made 
>>> >> OSDs flap when it used that mon to check availability:
>>> >>
>>> >> On Wed, 13 Mar 2019 at 11:50 AM, Zhenshi Zhou  
>>> >> wrote:
>>> >>>
>>> >>> After checking the network and syslog/dmsg, I think it's not the 
>>> >>> network or hardware issue. Now there're some
>>> >>> osds being marked down every 15 minutes.
>>> >>>
>>> >>> here is ceph.log:
>>> >>> 2019-03-13 11:06:26.290701 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6756 : 
>>> >>> cluster [INF] Cluster is now healthy
>>> >>> 2019-03-13 11:21:21.705787 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6757 : 
>>> >>> cluster [INF] osd.1 marked down after no beacon for 900.067020 seconds
>>> >>> 2019-03-13 11:21:21.705858 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6758 : 
>>> >>> cluster [INF] osd.2 marked down after no beacon for 900.067020 seconds
>>> >>> 2019-03-13 11:21:21.705920 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6759 : 
>>> >>> cluster [INF] osd.4 marked down after no beacon for 900.067020 seconds
>>> >>> 2019-03-13 11:21:21.705957 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6760 : 
>>> >>> cluster [INF] osd.6 marked down after no beacon for 900.067020 seconds
>>> >>> 2019-03-13 11:21:21.705999 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6761 : 
>>> >>> cluster [INF] osd.7 marked down after no beacon for 900.067020 seconds
>>> >>> 2019-03-13 11:21:21.706040 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6762 : 
>>> >>> cluster [INF] osd.10 marked down after no beacon for 900.067020 seconds
>>> >>> 2019-03-13 11:21:21.706079 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6763 : 
>>> >>> cluster [INF] osd.11 marked down after no beacon for 900.067020 seconds
>>> >>> 2019-03-13 11:21:21.706118 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6764 : 
>>> >>> cluster [INF] osd.12 marked down after no beacon for 900.067020 seconds
>>> >>> 2019-03-13 11:21:21.706155 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6765 : 
>>> >>> cluster [INF] osd.13 marked down after no beacon for 900.067020 seconds
>>> >>> 2019-03-13 11:21:21.706195 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6766 : 
>>> >>> cluster [INF] osd.14 marked down after no beacon for 900.067020 seconds
>>> >>> 2019-03-13 11:21:21.706233 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6767 : 
>>> >>> cluster [INF] osd.15 marked down after no beacon for 900.067020 seconds
>>> >>> 2019-03-13 11:21:21.706273 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6768 : 
>>> >>> cluster [INF] osd.16 marked down after no beacon for 900.067020 seconds
>>> >>&g

Re: [ceph-users] cluster is not stable

2019-03-13 Thread huang jun

can you get the value of osd_beacon_report_interval item? the default
is 300, you can set to 60,  or maybe turn on debug_ms=1 debug_mon=10
can get more infos.


Zhenshi Zhou  于2019年3月13日周三 下午1:20写道：
>
> Hi,
>
> The servers are cennected to the same switch.
> I can ping from anyone of the servers to other servers
> without a packet lost and the average round trip time
> is under 0.1 ms.
>
> Thanks
>
> Ashley Merrick  于2019年3月13日周三 下午12:06写道：
>>
>> Can you ping all your OSD servers from all your mons, and ping your mons 
>> from all your OSD servers?
>>
>> I’ve seen this where a route wasn’t working one direction, so it made OSDs 
>> flap when it used that mon to check availability:
>>
>> On Wed, 13 Mar 2019 at 11:50 AM, Zhenshi Zhou  wrote:
>>>
>>> After checking the network and syslog/dmsg, I think it's not the network or 
>>> hardware issue. Now there're some
>>> osds being marked down every 15 minutes.
>>>
>>> here is ceph.log:
>>> 2019-03-13 11:06:26.290701 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6756 : 
>>> cluster [INF] Cluster is now healthy
>>> 2019-03-13 11:21:21.705787 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6757 : 
>>> cluster [INF] osd.1 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.705858 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6758 : 
>>> cluster [INF] osd.2 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.705920 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6759 : 
>>> cluster [INF] osd.4 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.705957 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6760 : 
>>> cluster [INF] osd.6 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.705999 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6761 : 
>>> cluster [INF] osd.7 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706040 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6762 : 
>>> cluster [INF] osd.10 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706079 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6763 : 
>>> cluster [INF] osd.11 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706118 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6764 : 
>>> cluster [INF] osd.12 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706155 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6765 : 
>>> cluster [INF] osd.13 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706195 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6766 : 
>>> cluster [INF] osd.14 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706233 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6767 : 
>>> cluster [INF] osd.15 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706273 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6768 : 
>>> cluster [INF] osd.16 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706312 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6769 : 
>>> cluster [INF] osd.17 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706351 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6770 : 
>>> cluster [INF] osd.18 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706385 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6771 : 
>>> cluster [INF] osd.19 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706423 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6772 : 
>>> cluster [INF] osd.20 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706503 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6773 : 
>>> cluster [INF] osd.22 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706549 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6774 : 
>>> cluster [INF] osd.23 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706587 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6775 : 
>>> cluster [INF] osd.25 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706625 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6776 : 
>>> cluster [INF] osd.26 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706665 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6777 : 
>>> cluster [INF] osd.27 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706703 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6778 : 
>>> cluster [INF] osd.28 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706741 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6779 : 
>>> cluster [INF] osd.30 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706779 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6780 : 
>>> cluster [INF] osd.31 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706817 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6781 : 
>>> cluster [INF] osd.33 marked down after no beacon for 900.067020 seconds
>>> 2019-03-13 11:21:21.706856 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6782 : 
>>> cluster [INF] osd.34 marked down

Re: [ceph-users] Bug maybe: osdmap failed undecoded

2017-02-23 Thread huang jun

you can copy the corrupt osdmap file from osd.1 and then restart osd,
we met this before, and that works for us.

2017-02-23 22:33 GMT+08:00 tao chang :
> HI,
>
> I have a ceph cluster  (ceph 10.2.5) witch 3 node, each has two osds.
>
> It was a power outage last night  and all the server are restarted
> this morning again.
> All osds are work well except the osd.0.
>
> ID WEIGHT  TYPE NAMEUP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 0.04500 root volumes
> -2 0.01500 host zk25-02
>  0 0.01500 osd.0   down0  1.0
>  1 0.01500 osd.1 up  1.0  1.0
> -3 0.01500 host zk25-03
>  2 0.01500 osd.2 up  1.0  1.0
>  3 0.01500 osd.3 up  1.0  1.0
> -4 0.01500 host zk25-01
>  4 0.01500 osd.4 up  1.0  1.0
>  5 0.01500 osd.5 up  1.0  1.0
>
> I tried to run it again with gdb, it turned it like this:
>
> (gdb) bt
> #0  0x74cfd5f7 in raise () from /lib64/libc.so.6
> #1  0x74cfece8 in abort () from /lib64/libc.so.6
> #2  0x756019d5 in __gnu_cxx::__verbose_terminate_handler() ()
> from /lib64/libstdc++.so.6
> #3  0x755ff946 in ?? () from /lib64/libstdc++.so.6
> #4  0x755ff973 in std::terminate() () from /lib64/libstdc++.so.6
> #5  0x755ffb93 in __cxa_throw () from /lib64/libstdc++.so.6
> #6  0x55b93b7f in pg_pool_t::decode (this=,
> bl=...) at osd/osd_types.cc:1569
> #7  0x55f3a53f in decode (p=..., c=...) at osd/osd_types.h:1487
> #8  decode (m=Python Exception  'exceptions.IndexError'> list index out of range:
> std::map with 1 elements, p=...) at include/encoding.h:648
> #9  0x55f2fa8d in OSDMap::decode_classic
> (this=this@entry=0x5fdf6480, p=...) at osd/OSDMap.cc:2026
> #10 0x55f2fe8c in OSDMap::decode
> (this=this@entry=0x5fdf6480, bl=...) at osd/OSDMap.cc:2116
> #11 0x55f3116e in OSDMap::decode (this=0x5fdf6480, bl=...)
> at osd/OSDMap.cc:1985
> #12 0x558e51fc in OSDService::try_get_map
> (this=0x5ff51860, epoch=76) at osd/OSD.cc:1340
> #13 0x55947ece in OSDService::get_map (this=,
> e=, this=) at osd/OSD.h:884
> #14 0x558fb0f2 in OSD::init (this=0x5ff5) at osd/OSD.h:1917
> #15 0x5585eea5 in main (argc=, argv= out>) at ceph_osd.cc:605
>
> it was caused by failed undecoded of osdmap structure from osdmap
> file(/var/lib/ceph/osd/ceph-0/current/meta/osdmap.76__0_64173F9C__none)
> .
> And by comparing the same file on osd.1, It make sure the osdmap file
> has been corrupted.
>
>
> Any one know how to fix it ? Thanks for advance !
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Does anyone know why cephfs do not support EC pool?

2016-10-17 Thread huang jun

you can look into this:
https://github.com/ceph/ceph/pull/10334
https://github.com/ceph/ceph/compare/master...athanatos:wip-ec-cache
the community have do a lot works related to ec for rbd and fs interface.

2016-10-18 13:06 GMT+08:00 Erick Perez - Quadrian Enterprises <
epe...@quadrianweb.com>:

> On Mon, Oct 17, 2016 at 9:23 PM, huang jun <hjwsm1...@gmail.com> wrote:
>
>> ec only support writefull and append operations, but not partial write,
>> your can try it by doing random writes, see if the osd crash or not.
>>
>> 2016-10-18 10:10 GMT+08:00 Liuxuan <liu.x...@h3c.com>:
>> > Hello:
>> >
>> >
>> >
>> >   I have create cephfs which data pool type is EC and metadata is
>> replica,
>> > The cluster reported errors from MDSMonitor::_check_pool function.
>> >
>> >  But when I ignore to check the pool type, the cephfs can write and read
>> > datas. Does anyone know why cephfs do not support EC pool?
>> >
>> >
>> >
>> > 
>> >
>> > liuxuan
>> >
>> >
>> >
>> > 
>> -
>> > 本邮件及其附件含有杭州华三通信技术有限公司的保密信息，仅限于发送给上面地址中列出
>> > 的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
>> > 或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
>> > 邮件！
>> > This e-mail and its attachments contain confidential information from
>> H3C,
>> > which is
>> > intended only for the person or entity whose address is listed above.
>> Any
>> > use of the
>> > information contained herein in any way (including, but not limited to,
>> > total or partial
>> > disclosure, reproduction, or dissemination) by persons other than the
>> > intended
>> > recipient(s) is prohibited. If you receive this e-mail in error, please
>> > notify the sender
>> > by phone or email immediately and delete it!
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Thank you!
>> HuangJun
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> Is EC in the roadmap for CEPH? Cant seem to find it. My question is
> because "all others" (Nutanix, Hypergrid) do EC storage for VMs as the
> default way of storage. It seems EC in ceph (as of Sept 2016) is considered
> by many "experimental" unless is used for cold data.
> --
>
> -
> Erick Perez
> Soluciones Tacticas Pasivas/Activas de Inteligencia y Analitica de Datos
> para Gobiernos
> Quadrian Enterprises S.A. - Panama, Republica de Panama
> Skype chat: eaperezh
> WhatsApp IM: +507-6675-5083
> POBOX 0819-12679, Panama
> Tel. (507) 391.8174 / (507) 391.8175
>



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Does anyone know why cephfs do not support EC pool?

2016-10-17 Thread huang jun

ec only support writefull and append operations, but not partial write,
your can try it by doing random writes, see if the osd crash or not.

2016-10-18 10:10 GMT+08:00 Liuxuan :
> Hello:
>
>
>
>   I have create cephfs which data pool type is EC and metadata is replica,
> The cluster reported errors from MDSMonitor::_check_pool function.
>
>  But when I ignore to check the pool type, the cephfs can write and read
> datas. Does anyone know why cephfs do not support EC pool?
>
>
>
> 
>
> liuxuan
>
>
>
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息，仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
> 或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
> 邮件！
> This e-mail and its attachments contain confidential information from H3C,
> which is
> intended only for the person or entity whose address is listed above. Any
> use of the
> information contained herein in any way (including, but not limited to,
> total or partial
> disclosure, reproduction, or dissemination) by persons other than the
> intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender
> by phone or email immediately and delete it!
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How do I restart node that I've killed in development mode

2016-10-12 Thread huang jun

./init-ceph start mon.a

2016-10-12 14:54 GMT+08:00 agung Laksono :
> Hi Ceph Users,
>
> I deploy development cluster using vstart with 3 MONs and 3 OSDs.
> On my experiment, Kill one of the monitor nodes by its pid. like this:
>
>   $ kill -SIGSEGV 27557
>
> After a new monitor leader is chosen, I would like to re-run the monitor
> that I've killed in the previous step. How do I do this?
>
>
> Thanks
>
> --
> Cheers,
>
> Agung Laksono
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph journal system vs filesystem journal system

2016-09-01 Thread huang jun

2016-09-01 17:25 GMT+08:00 한승진 :
> Hi all.
>
> I'm very confused about ceph journal system
>
> Some people said ceph journal system works like linux journal filesystem.
>
> Also some people said all data are written journal first and then written to
> OSD data.
>
> Journal of Ceph storage also write just metadata of object or write all data
> of object?
>
> Which is right?
>

data writen to osd first will write to osd journal through dio, and
then submit to objectstore,
that will improve the small file write performance bc the journal
write is sequential not random,
and journal can recover the data that written to journal but didn't
write to objectstore yet, like outage..

> Thanks for your help
>
> Best regards.
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] krbd map on Jewel, sysfs write failed when rbd map

2016-04-18 Thread huang jun

Hi, can you post the 'modinfo rbd' and your cluster state 'ceph -s'.

2016-04-18 16:35 GMT+08:00 席智勇 :
> hi cephers:
>
> I create a rbd volume(image) on Jewel release, when exec rbd map, I got the
> error message as follows.i can not  find any message usage in
> syslog/kern.log/messages.
> anyone can share some tips?
>
> --my ceph
> version
> root@hzbxs-xzy-ceph-dev-mon:~/ceph-cluster# ceph -v
> ceph version 10.1.2 (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4)
> -system
> info
> root@hzbxs-xzy-ceph-dev-mon:~/ceph-cluster# cat /etc/debian_version
> 8.4
> root@hzbxs-xzy-ceph-dev-mon:~/ceph-cluster# uname -a
> Linux hzbxs-xzy-ceph-dev-mon 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2
> (2016-04-08) x86_64 GNU/Linux
> --my screen
> snap-
> root@hzbxs-xzy-ceph-dev-mon:~/ceph-cluster# ceph osd lspools
> 2016-04-18 16:24:24.691909 7fb08663b700 -1 WARNING: the following dangerous
> and experimental features are enabled: bluestore,rocksdb
> 2016-04-18 16:24:24.701003 7fb08663b700 -1 WARNING: the following dangerous
> and experimental features are enabled: bluestore,rocksdb
> 1 rbd,
> root@hzbxs-xzy-ceph-dev-mon:~/ceph-cluster# rbd --image test_volume info
> --pool rbd
> 2016-04-18 16:24:27.396490 7f2156ea2d40 -1 WARNING: the following dangerous
> and experimental features are enabled: bluestore,rocksdb
> 2016-04-18 16:24:27.396551 7f2156ea2d40 -1 WARNING: the following dangerous
> and experimental features are enabled: bluestore,rocksdb
> 2016-04-18 16:24:27.399508 7f2156ea2d40 -1 WARNING: the following dangerous
> and experimental features are enabled: bluestore,rocksdb
> rbd image 'test_volume':
> size 1024 MB in 256 objects
> order 22 (4096 kB objects)
> block_name_prefix: rbd_data.371a6b8b4567
> format: 2
> features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
> flags:
> root@hzbxs-xzy-ceph-dev-mon:~/ceph-cluster# rbd map test_volume --pool rbd
> 2016-04-18 16:25:02.547462 7f291c173d40 -1 WARNING: the following dangerous
> and experimental features are enabled: bluestore,rocksdb
> 2016-04-18 16:25:02.547523 7f291c173d40 -1 WARNING: the following dangerous
> and experimental features are enabled: bluestore,rocksdb
> 2016-04-18 16:25:02.550352 7f291c173d40 -1 WARNING: the following dangerous
> and experimental features are enabled: bluestore,rocksdb
> rbd: sysfs write failed
> rbd: map failed: (6) No such device or address
> root@hzbxs-xzy-ceph-dev-mon:~/ceph-cluster#
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] infernalis and jewel upgrades...

2016-04-15 Thread huang jun

can you find the full osdmap.16024 in other osd?
seems like the osd::init doesnt read the incremental osdmap but the full osdmap,
if you find it, then copy to osd.3.

2016-04-16 13:27 GMT+08:00 hjcho616 <hjcho...@yahoo.com>:
> I found below file missing on osd.3 so I copied over.  Still fails with the
> similar message.  What can I try next?
>
> -1> 2016-04-16 00:22:32.579622 7f8d5c340800 20 osd.3 0 get_map 16024 -
> loading and decoding 0x7f8d65d04900
>  0> 2016-04-16 00:22:32.584406 7f8d5c340800 -1 osd/OSD.h: In function
> 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8d5c340800 time 2016-04-16
> 00:22:32.579890
> osd/OSD.h: 885: FAILED assert(ret)
>
>  ceph version 10.1.2 (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x82) [0x7f8d5bdc64f2]
>  2: (OSDService::get_map(unsigned int)+0x3d) [0x7f8d5b74d83d]
>  3: (OSD::init()+0x1862) [0x7f8d5b6fba52]
>  4: (main()+0x2b05) [0x7f8d5b661735]
>  5: (__libc_start_main()+0xf5) [0x7f8d581f7b45]
>  6: (()+0x337197) [0x7f8d5b6ac197]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to
> interpret this.
>
> Regards,
> Hong
>
>
> On Saturday, April 16, 2016 12:11 AM, hjcho616 <hjcho...@yahoo.com> wrote:
>
>
> Is this it?
>
> root@OSD2:/var/lib/ceph/osd/ceph-3/current/meta# find ./ | grep osdmap |
> grep 16024
> ./DIR_E/DIR_3/inc\uosdmap.16024__0_46887E3E__none
>
> Regards,
> Hong
>
>
> On Friday, April 15, 2016 11:53 PM, huang jun <hjwsm1...@gmail.com> wrote:
>
>
> First, you should check whether file osdmap.16024 exists in your
> osd.3/current/meta dir,
> if not, you can copy it from other OSD who has it.
>
>
> 2016-04-16 12:36 GMT+08:00 hjcho616 <hjcho...@yahoo.com>:
>> Here is what I get wtih debug_osd = 20.
>>
>> 2016-04-15 23:28:24.429063 7f9ca0a5b800  0 set uid:gid to 1001:1001
>> (ceph:ceph)
>> 2016-04-15 23:28:24.429167 7f9ca0a5b800  0 ceph version 10.1.2
>> (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4), process ceph-osd, pid 2092
>> 2016-04-15 23:28:24.432034 7f9ca0a5b800  0 pidfile_write: ignore empty
>> --pid-file
>> 2016-04-15 23:28:24.459417 7f9ca0a5b800 10
>> ErasureCodePluginSelectJerasure:
>> load: jerasure_sse3
>> 2016-04-15 23:28:24.470016 7f9ca0a5b800 10 load: jerasure load: lrc load:
>> isa
>> 2016-04-15 23:28:24.472013 7f9ca0a5b800  2 osd.3 0 mounting
>> /var/lib/ceph/osd/ceph-3 /var/lib/ceph/osd/ceph-3/journal
>> 2016-04-15 23:28:24.472292 7f9ca0a5b800  0
>> filestore(/var/lib/ceph/osd/ceph-3) backend xfs (magic 0x58465342)
>> 2016-04-15 23:28:24.473496 7f9ca0a5b800  0
>> genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: FIEMAP
>> ioctl is disabled via 'filestore fiemap' config option
>> 2016-04-15 23:28:24.473541 7f9ca0a5b800  0
>> genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features:
>> SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config
>> option
>> 2016-04-15 23:28:24.473615 7f9ca0a5b800  0
>> genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: splice
>> is
>> supported
>> 2016-04-15 23:28:24.494485 7f9ca0a5b800  0
>> genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features:
>> syncfs(2)
>> syscall fully supported (by glibc and kernel)
>> 2016-04-15 23:28:24.494802 7f9ca0a5b800  0
>> xfsfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_feature: extsize is
>> disabled by conf
>> 2016-04-15 23:28:24.499066 7f9ca0a5b800  1 leveldb: Recovering log #20901
>> 2016-04-15 23:28:24.782188 7f9ca0a5b800  1 leveldb: Delete type=0 #20901
>>
>> 2016-04-15 23:28:24.782420 7f9ca0a5b800  1 leveldb: Delete type=3 #20900
>>
>> 2016-04-15 23:28:24.784810 7f9ca0a5b800  0
>> filestore(/var/lib/ceph/osd/ceph-3) mount: enabling WRITEAHEAD journal
>> mode:
>> checkpoint is not enabled
>> 2016-04-15 23:28:24.792918 7f9ca0a5b800  1 journal _open
>> /var/lib/ceph/osd/ceph-3/journal fd 18: 14998831104 bytes, block size 4096
>> bytes, directio = 1, aio = 1
>> 2016-04-15 23:28:24.800583 7f9ca0a5b800  1 journal _open
>> /var/lib/ceph/osd/ceph-3/journal fd 18: 14998831104 bytes, block size 4096
>> bytes, directio = 1, aio = 1
>> 2016-04-15 23:28:24.808144 7f9ca0a5b800  1
>> filestore(/var/lib/ceph/osd/ceph-3) upgrade
>> 2016-04-15 23:28:24.808540 7f9ca0a5b800  2 osd.3 0 boot
>> 2016-04-15 23:28:24.809265 7f9ca0a5b800 10 osd.3 0 read_superblock
>> sb(9b2c9bca-112e-48b0-86fc-587ef9a52948 osd.3
>> 4f86a418-6c67-4cb4-83a1-6c123c890036 e16024 [15332,16024]
>> lci=[16010,16024])
>> 2016-04-15

Re: [ceph-users] infernalis and jewel upgrades...

2016-04-15 Thread huang jun

yes, it's a incremental osdmap, does the file size is correct?
you can compare it with the same file in other osd.
If it's not the same, you can overwrite it with the right one.

2016-04-16 13:11 GMT+08:00 hjcho616 <hjcho...@yahoo.com>:
> Is this it?
>
> root@OSD2:/var/lib/ceph/osd/ceph-3/current/meta# find ./ | grep osdmap |
> grep 16024
> ./DIR_E/DIR_3/inc\uosdmap.16024__0_46887E3E__none
>
> Regards,
> Hong
>
>
> On Friday, April 15, 2016 11:53 PM, huang jun <hjwsm1...@gmail.com> wrote:
>
>
> First, you should check whether file osdmap.16024 exists in your
> osd.3/current/meta dir,
> if not, you can copy it from other OSD who has it.
>
>
> 2016-04-16 12:36 GMT+08:00 hjcho616 <hjcho...@yahoo.com>:
>> Here is what I get wtih debug_osd = 20.
>>
>> 2016-04-15 23:28:24.429063 7f9ca0a5b800  0 set uid:gid to 1001:1001
>> (ceph:ceph)
>> 2016-04-15 23:28:24.429167 7f9ca0a5b800  0 ceph version 10.1.2
>> (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4), process ceph-osd, pid 2092
>> 2016-04-15 23:28:24.432034 7f9ca0a5b800  0 pidfile_write: ignore empty
>> --pid-file
>> 2016-04-15 23:28:24.459417 7f9ca0a5b800 10
>> ErasureCodePluginSelectJerasure:
>> load: jerasure_sse3
>> 2016-04-15 23:28:24.470016 7f9ca0a5b800 10 load: jerasure load: lrc load:
>> isa
>> 2016-04-15 23:28:24.472013 7f9ca0a5b800  2 osd.3 0 mounting
>> /var/lib/ceph/osd/ceph-3 /var/lib/ceph/osd/ceph-3/journal
>> 2016-04-15 23:28:24.472292 7f9ca0a5b800  0
>> filestore(/var/lib/ceph/osd/ceph-3) backend xfs (magic 0x58465342)
>> 2016-04-15 23:28:24.473496 7f9ca0a5b800  0
>> genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: FIEMAP
>> ioctl is disabled via 'filestore fiemap' config option
>> 2016-04-15 23:28:24.473541 7f9ca0a5b800  0
>> genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features:
>> SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config
>> option
>> 2016-04-15 23:28:24.473615 7f9ca0a5b800  0
>> genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: splice
>> is
>> supported
>> 2016-04-15 23:28:24.494485 7f9ca0a5b800  0
>> genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features:
>> syncfs(2)
>> syscall fully supported (by glibc and kernel)
>> 2016-04-15 23:28:24.494802 7f9ca0a5b800  0
>> xfsfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_feature: extsize is
>> disabled by conf
>> 2016-04-15 23:28:24.499066 7f9ca0a5b800  1 leveldb: Recovering log #20901
>> 2016-04-15 23:28:24.782188 7f9ca0a5b800  1 leveldb: Delete type=0 #20901
>>
>> 2016-04-15 23:28:24.782420 7f9ca0a5b800  1 leveldb: Delete type=3 #20900
>>
>> 2016-04-15 23:28:24.784810 7f9ca0a5b800  0
>> filestore(/var/lib/ceph/osd/ceph-3) mount: enabling WRITEAHEAD journal
>> mode:
>> checkpoint is not enabled
>> 2016-04-15 23:28:24.792918 7f9ca0a5b800  1 journal _open
>> /var/lib/ceph/osd/ceph-3/journal fd 18: 14998831104 bytes, block size 4096
>> bytes, directio = 1, aio = 1
>> 2016-04-15 23:28:24.800583 7f9ca0a5b800  1 journal _open
>> /var/lib/ceph/osd/ceph-3/journal fd 18: 14998831104 bytes, block size 4096
>> bytes, directio = 1, aio = 1
>> 2016-04-15 23:28:24.808144 7f9ca0a5b800  1
>> filestore(/var/lib/ceph/osd/ceph-3) upgrade
>> 2016-04-15 23:28:24.808540 7f9ca0a5b800  2 osd.3 0 boot
>> 2016-04-15 23:28:24.809265 7f9ca0a5b800 10 osd.3 0 read_superblock
>> sb(9b2c9bca-112e-48b0-86fc-587ef9a52948 osd.3
>> 4f86a418-6c67-4cb4-83a1-6c123c890036 e16024 [15332,16024]
>> lci=[16010,16024])
>> 2016-04-15 23:28:24.810029 7f9ca0a5b800 10 open_all_classes
>> 2016-04-15 23:28:24.810433 7f9ca0a5b800 10 open_all_classes found journal
>> 2016-04-15 23:28:24.810746 7f9ca0a5b800 10 _get_class adding new class
>> name
>> journal 0x7f9caa628808
>> 2016-04-15 23:28:24.811059 7f9ca0a5b800 10 _load_class journal from
>> /usr/lib/rados-classes/libcls_journal.so
>> 2016-04-15 23:28:24.814498 7f9ca0a5b800 10 register_class journal status 3
>> 2016-04-15 23:28:24.814650 7f9ca0a5b800 10 register_cxx_method
>> journal.create flags 3 0x7f9c8dadac00
>> 2016-04-15 23:28:24.814745 7f9ca0a5b800 10 register_cxx_method
>> journal.get_order flags 1 0x7f9c8dada3c0
>> 2016-04-15 23:28:24.814838 7f9ca0a5b800 10 register_cxx_method
>> journal.get_splay_width flags 1 0x7f9c8dada360
>> 2016-04-15 23:28:24.814925 7f9ca0a5b800 10 register_cxx_method
>> journal.get_pool_id flags 1 0x7f9c8dadaa30
>> 2016-04-15 23:28:24.815062 7f9ca0a5b800 10 register_cxx_method
>> journal.get_minimum_set flags 1 0x7f9c8dada9c0
>> 2016-04-15 23:28:24.815

Re: [ceph-users] infernalis and jewel upgrades...

2016-04-15 Thread huang jun

First, you should check whether file osdmap.16024 exists in your
osd.3/current/meta dir,
if not, you can copy it from other OSD who has it.


2016-04-16 12:36 GMT+08:00 hjcho616 :
> Here is what I get wtih debug_osd = 20.
>
> 2016-04-15 23:28:24.429063 7f9ca0a5b800  0 set uid:gid to 1001:1001
> (ceph:ceph)
> 2016-04-15 23:28:24.429167 7f9ca0a5b800  0 ceph version 10.1.2
> (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4), process ceph-osd, pid 2092
> 2016-04-15 23:28:24.432034 7f9ca0a5b800  0 pidfile_write: ignore empty
> --pid-file
> 2016-04-15 23:28:24.459417 7f9ca0a5b800 10 ErasureCodePluginSelectJerasure:
> load: jerasure_sse3
> 2016-04-15 23:28:24.470016 7f9ca0a5b800 10 load: jerasure load: lrc load:
> isa
> 2016-04-15 23:28:24.472013 7f9ca0a5b800  2 osd.3 0 mounting
> /var/lib/ceph/osd/ceph-3 /var/lib/ceph/osd/ceph-3/journal
> 2016-04-15 23:28:24.472292 7f9ca0a5b800  0
> filestore(/var/lib/ceph/osd/ceph-3) backend xfs (magic 0x58465342)
> 2016-04-15 23:28:24.473496 7f9ca0a5b800  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: FIEMAP
> ioctl is disabled via 'filestore fiemap' config option
> 2016-04-15 23:28:24.473541 7f9ca0a5b800  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features:
> SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
> 2016-04-15 23:28:24.473615 7f9ca0a5b800  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: splice is
> supported
> 2016-04-15 23:28:24.494485 7f9ca0a5b800  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: syncfs(2)
> syscall fully supported (by glibc and kernel)
> 2016-04-15 23:28:24.494802 7f9ca0a5b800  0
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_feature: extsize is
> disabled by conf
> 2016-04-15 23:28:24.499066 7f9ca0a5b800  1 leveldb: Recovering log #20901
> 2016-04-15 23:28:24.782188 7f9ca0a5b800  1 leveldb: Delete type=0 #20901
>
> 2016-04-15 23:28:24.782420 7f9ca0a5b800  1 leveldb: Delete type=3 #20900
>
> 2016-04-15 23:28:24.784810 7f9ca0a5b800  0
> filestore(/var/lib/ceph/osd/ceph-3) mount: enabling WRITEAHEAD journal mode:
> checkpoint is not enabled
> 2016-04-15 23:28:24.792918 7f9ca0a5b800  1 journal _open
> /var/lib/ceph/osd/ceph-3/journal fd 18: 14998831104 bytes, block size 4096
> bytes, directio = 1, aio = 1
> 2016-04-15 23:28:24.800583 7f9ca0a5b800  1 journal _open
> /var/lib/ceph/osd/ceph-3/journal fd 18: 14998831104 bytes, block size 4096
> bytes, directio = 1, aio = 1
> 2016-04-15 23:28:24.808144 7f9ca0a5b800  1
> filestore(/var/lib/ceph/osd/ceph-3) upgrade
> 2016-04-15 23:28:24.808540 7f9ca0a5b800  2 osd.3 0 boot
> 2016-04-15 23:28:24.809265 7f9ca0a5b800 10 osd.3 0 read_superblock
> sb(9b2c9bca-112e-48b0-86fc-587ef9a52948 osd.3
> 4f86a418-6c67-4cb4-83a1-6c123c890036 e16024 [15332,16024] lci=[16010,16024])
> 2016-04-15 23:28:24.810029 7f9ca0a5b800 10 open_all_classes
> 2016-04-15 23:28:24.810433 7f9ca0a5b800 10 open_all_classes found journal
> 2016-04-15 23:28:24.810746 7f9ca0a5b800 10 _get_class adding new class name
> journal 0x7f9caa628808
> 2016-04-15 23:28:24.811059 7f9ca0a5b800 10 _load_class journal from
> /usr/lib/rados-classes/libcls_journal.so
> 2016-04-15 23:28:24.814498 7f9ca0a5b800 10 register_class journal status 3
> 2016-04-15 23:28:24.814650 7f9ca0a5b800 10 register_cxx_method
> journal.create flags 3 0x7f9c8dadac00
> 2016-04-15 23:28:24.814745 7f9ca0a5b800 10 register_cxx_method
> journal.get_order flags 1 0x7f9c8dada3c0
> 2016-04-15 23:28:24.814838 7f9ca0a5b800 10 register_cxx_method
> journal.get_splay_width flags 1 0x7f9c8dada360
> 2016-04-15 23:28:24.814925 7f9ca0a5b800 10 register_cxx_method
> journal.get_pool_id flags 1 0x7f9c8dadaa30
> 2016-04-15 23:28:24.815062 7f9ca0a5b800 10 register_cxx_method
> journal.get_minimum_set flags 1 0x7f9c8dada9c0
> 2016-04-15 23:28:24.815162 7f9ca0a5b800 10 register_cxx_method
> journal.set_minimum_set flags 3 0x7f9c8dada830
> 2016-04-15 23:28:24.815246 7f9ca0a5b800 10 register_cxx_method
> journal.get_active_set flags 1 0x7f9c8dada7c0
> 2016-04-15 23:28:24.815336 7f9ca0a5b800 10 register_cxx_method
> journal.set_active_set flags 3 0x7f9c8dada630
> 2016-04-15 23:28:24.815417 7f9ca0a5b800 10 register_cxx_method
> journal.get_client flags 1 0x7f9c8dadafb0
> 2016-04-15 23:28:24.815501 7f9ca0a5b800 10 register_cxx_method
> journal.client_register flags 3 0x7f9c8dadc140
> 2016-04-15 23:28:24.815589 7f9ca0a5b800 10 register_cxx_method
> journal.client_update_data flags 3 0x7f9c8dadb730
> 2016-04-15 23:28:24.815679 7f9ca0a5b800 10 register_cxx_method
> journal.client_update_state flags 3 0x7f9c8dadb300
> 2016-04-15 23:28:24.815771 7f9ca0a5b800 10 register_cxx_method
> journal.client_unregister flags 3 0x7f9c8dadf060
> 2016-04-15 23:28:24.815854 7f9ca0a5b800 10 register_cxx_method
> journal.client_commit flags 3 0x7f9c8dadbc40
> 2016-04-15 23:28:24.815934 7f9ca0a5b800 10 register_cxx_method
> journal.client_list flags 1 0x7f9c8dadc9c0
> 2016-04-15 23:28:24.816019

Re: [ceph-users] howto delete a pg

2016-04-15 Thread huang jun

for your cluster warning message, it's a pg's some objects have
inconsistent in primary and replicas,
so you can try 'ceph pg repair $PGID'.

2016-04-16 9:04 GMT+08:00 Oliver Dzombic :
> Hi,
>
> i meant of course
>
> 0.e6_head
> 0.e6_TEMP
>
> in
>
> /var/lib/ceph/osd/ceph-12/current
>
> sry...
>
>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:i...@ip-interactive.de
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
>
>
> Am 16.04.2016 um 03:03 schrieb Oliver Dzombic:
>> Hi,
>>
>> pg 0.e6 is active+clean+inconsistent, acting [12,7]
>>
>> /var/log/ceph/ceph-osd.12.log:36:2016-04-16 01:08:40.058585 7f4f6bc70700
>> -1 log_channel(cluster) log [ERR] : 0.e6 deep-scrub stat mismatch, got
>> 4476/4477 objects, 133/133 clones, 4476/4477 dirty, 1/1 omap, 0/0
>> hit_set_archive, 0/0 whiteouts, 18467422208/18471616512 bytes,0/0
>> hit_set_archive bytes.
>>
>>
>> i tried to follow
>>
>> https://ceph.com/planet/ceph-manually-repair-object/
>>
>> did not really work for me.
>>
>> How do i kill this pg completely from osd.12 ?
>>
>> Can i simply delete
>>
>> 0.6_head
>> 0.6_TEMP
>>
>> in
>>
>> /var/lib/ceph/osd/ceph-12/current
>>
>> and ceph will take the other copy and multiply it again, and all is fine ?
>>
>> Or would that be the start of the end ? ^^;
>>
>> Thank you !
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] chunk-based cache in ceph with erasure coded back-end storage

2016-03-31 Thread huang jun

the data encode/decode operation is done on OSD side.

2016-03-31 23:28 GMT+08:00 Yu Xiang <hellomorn...@luckymail.com>:
> Thanks for the reply!
> So where did the decoding process happen? Is it in cache or on the client
> side? (Only considering Read.) If it happened when copying from storage tier
> to cache, then it has to be an whole object (file), but if decoding can be
> happened on client side when the client has all needed chunks, it seems
> cache can hold partial chunks of the file? What i mean is that is it
> possible for cache to hold partial chunks of a file in Ceph?  (assuming file
> A has 7 chunks in storage tier, to recover file A a client needs 4 chunks,
> will it be possible that 2 chunks of file A are copied to and stored in
> cache, when file A is requested, only another 2 chunks are needed from the
> storage tier? )
>
> Thanks!
>
>
>
> e-Original Message-
> From: huang jun <hjwsm1...@gmail.com>
> To: Yu Xiang <hellomorn...@luckymail.com>
> Cc: ceph-users <ceph-users@lists.ceph.com>
> Sent: Wed, Mar 30, 2016 9:04 pm
> Subject: Re: [ceph-users] chunk-based cache in ceph with erasure coded
> back-end storage
>
>
> if your cache-mode is write-back, which will cache the read object in
> cache tier.
> you can try the read-proxy mode, which will not cache the object.
> the read request send to primary OSD, and the primary osd collect the
> shards from base tier(in you case, is erasure code pool),
> you need to read at least k chunks to decode the object.
> In current code, cache tier only store the whole object, not the shards.
>
>
> 2016-03-31 6:10 GMT+08:00 Yu Xiang <hellomorn...@luckymail.com>:
>> Dear List,
>> I am exploring in ceph caching tier recently, considering a cache-tier
>> (replicated) and a back storage-tier (erasure-coded), so chunks are stored
>> in the OSDs in the erasure-coded storage tier, when a file has been
>> requested to read, usually, all chunks in the storage tier would be copied
>> to the cache tier, replicated, and stored in the OSDs in caching pool, but
>> i
>> was wondering would it be possible that if only partial chunks of the
>> requested file be copied to cache? or it has to be a complete file? for
>> example, a file using (7,4) erasure code (4 original chunks, 3 encoded
>> chunks), when read it might be 4 required chunks are copied to cache, and
>> i
>> was wondering if it's possible to copy only 2 out of 4 required chunks to
>> cache, and the users getting the other 2 chunks elsewhere (or assuming the
>> client already has 2 chunks, they only need another 2 from ceph)? can the
>> cache store partial chunks of a file?
>>
>> Thanks in advance for any help!
>>
>> Best,
>> Yu
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> thanks
> huangjun



-- 
thanks
huangjun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] chunk-based cache in ceph with erasure coded back-end storage

2016-03-30 Thread huang jun

if your cache-mode is write-back, which will cache the read object in
cache tier.
you can try the read-proxy mode, which will not cache the object.
the read request send to primary OSD, and the primary osd collect the
shards from base tier(in you case, is erasure code pool),
you need to read at least k chunks  to decode the object.
In current code, cache tier only store the whole object, not the shards.


2016-03-31 6:10 GMT+08:00 Yu Xiang :
> Dear List,
> I am exploring in ceph caching tier recently, considering a cache-tier
> (replicated) and a back storage-tier (erasure-coded), so chunks are stored
> in the OSDs in the erasure-coded storage tier, when a file has been
> requested to read,  usually, all chunks in the storage tier would be copied
> to the cache tier, replicated, and stored in the OSDs in caching pool, but i
> was wondering would it be possible that if only partial chunks of the
> requested file be copied to cache? or it has to be a complete file? for
> example, a file using (7,4) erasure code (4 original chunks, 3 encoded
> chunks), when read it might be 4 required chunks are copied to cache, and i
> was wondering if it's possible to copy only 2 out of 4 required chunks to
> cache, and the users getting the other 2 chunks elsewhere (or assuming the
> client already has 2 chunks, they only need another 2 from ceph)? can the
> cache store partial chunks of a file?
>
> Thanks in advance for any help!
>
> Best,
> Yu
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
thanks
huangjun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Calculating PG in an mixed environment

2016-03-15 Thread huang jun

you can find in http://ceph.com/pgcalc/


2016-03-15 23:41 GMT+08:00 Martin Palma :
> Hi all,
>
> The documentation [0] gives us the following formula for calculating
> the number of PG if the cluster is bigger than 50 OSDs:
>
>  (OSDs * 100)
> Total PGs =  
>  pool size
>
> When we have mixed storage server (HDD disks and SSD disks) and we
> have defined different roots in our crush map to map some pools only
> to HDD disk and some to SSD disks like described by Sebastien Han [1].
>
> In the above formula what number of OSDs should be use to calculate
> the  PGs for a pool only on the HDD disks? The total number of OSDs in
> a cluster or only the number of OSDs which have an HDD disk as
> backend?
>
> Best,
> Martin
>
>
> [0] 
> http://docs.ceph.com/docs/master/rados/operations/placement-groups/#choosing-the-number-of-placement-groups
> [1] 
> http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
thanks
huangjun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] cluster_network goes slow during erasure code pool's stress testing

2015-12-21 Thread huang jun

hi,all
We meet a problem related to erasure pool with k:m=3:1 and stripe_unit=64k*3.
We have a cluster with 96 OSDs on 4 Hosts(hosts are: srv1, srv2, srv3,
srv4), each host have 24 OSDs,
each host have 12 core processors (Intel(R) Xeon(R) CPU E5-2620 v2 @
2.10GHz) and 48GB memory.
cluster configured with(both are 10GB ethernet):
cluster_network = 172.19.0.0/16
public_network = 192.168.0.0/16

Test suite below:
1) on each host, mount a Kernel client which bind to erasue pool
2) on each host, configure a smb server which use cephfs's mount point
3) every samba server have a windows smb client, which doing file
write\read\delete operations
4) every kernel client, we run a test shell script, write a 5GB file
recursivelly and create many dirs.

we run the test at 6:00 pm, but the second day morning, the cluster is broken,
1) there are 48 ODSs down, on srv1 and srv4
2) i check the down OSD's log, there are two kinds of log:
a) many osds down due to Filestore::op_thread timeout suicide
b) many osds down due to OSD::osd_op_tp timeout suicide

Because we have met this problem before, we use iperf to check the
network between srv1 and srv4;
the public_network is fine, the throughput can reach 9.20 Gbits/sec.
but the cluster_network performs bad from srv1 to srv4;
"iperf -c 172.19.10.4 " shows:
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-79.6 sec   384 KBytes  39.5 Kbits/sec

**but** iperf test from srv4 to srv1 is ok.

**note**:
a) at this time, there are no ceph-osd daemons on srv1 and srv4
b) after restart the network, iperf test on all sides shows ok

If the network is so slow, the osd_op_tp can be stucked in
submit_message if the reader is reciving data,
which can finally result the osd_op_tp thread suicide.

And we have another cluster with the same configuration,and run the
same tests, the **only** difference is
this cluster is testing replicated pool, not erasure pool.

why the network is so slow, bc the erasure pool use more cpu and mem
than replicated pool?

Any hints and tips are welcome.


-- 
thanks
huangjun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Confused about priority of client OP.

2015-12-03 Thread huang jun

In SimpleMessenger, the client OP like OSD_OP will dispatch by
ms_fast_dispatch, and not queued in PriortizedQueue in Messenger.

2015-12-03 22:14 GMT+08:00 Wukongming :
> Hi, All:
> I 've got a question about a priority. We defined 
> osd_client_op_priority = 63. CEPH_MSG_PRIO_LOW = 64.
> We are clear there are multiple IO to be discussed. Why not define 
> osd_client_op_priority > 64, so we can just deal with client IO in first 
> priority.
>
>
> -
> wukongming ID: 12019
> Tel：0571-86760239
> Dept：2014 UIS2 ONEStor
>
>
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息，仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
> 或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
> 邮件！
> This e-mail and its attachments contain confidential information from H3C, 
> which is
> intended only for the person or entity whose address is listed above. Any use 
> of the
> information contained herein in any way (including, but not limited to, total 
> or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please 
> notify the sender
> by phone or email immediately and delete it!



-- 
thanks
huangjun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How long will the logs be kept?

2015-12-02 Thread huang jun

it will rotate every week by default, you can see the logrotate file
/etc/ceph/logrotate.d/ceph

2015-12-03 12:37 GMT+08:00 Wukongming :
> Hi ,All
> Is there anyone who knows How long or how many days will the logs.gz 
> (mon/osd/mds)be kept, maybe before flushed?
>
> -
> wukongming ID: 12019
> Tel：0571-86760239
> Dept：2014 UIS2 OneStor
>
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息，仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
> 或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
> 邮件！
> This e-mail and its attachments contain confidential information from H3C, 
> which is
> intended only for the person or entity whose address is listed above. Any use 
> of the
> information contained herein in any way (including, but not limited to, total 
> or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please 
> notify the sender
> by phone or email immediately and delete it!



-- 
thanks
huangjun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] how to improve ceph cluster capacity usage

2015-09-02 Thread huang jun

After search the source code, i found ceph_psim tool which can
simulate objects distribution,
but it seems a little simple.



2015-09-01 22:58 GMT+08:00 huang jun <hjwsm1...@gmail.com>:
> hi,all
>
> Recently, i did some experiments on OSD data distribution,
> we set up a cluster with 72 OSDs,all 2TB sata disk,
> and ceph version is v0.94.3 and linux kernel version is 3.18,
> and set "ceph osd crush tunables optimal".
> There are 3 pools:
> pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 512 pgp_num 512 last_change 302 stripe_width 0
> pool 1 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 4096 pgp_num 4096 last_change 832
> crash_replay_interval 45 stripe_width 0
> pool 2 'metadata' replicated size 3 min_size 1 crush_ruleset 0
> object_hash rjenkins pg_num 512 pgp_num 512 last_change 302
> stripe_width 0
>
> the osd pg num of each osd:
> pool  : 0  1  2  | SUM
> 
> osd.0   13 10518 | 136
> osd.1   17 11026 | 153
> osd.2   15 11420 | 149
> osd.3   11 10117 | 129
> osd.4   8  10617 | 131
> osd.5   12 10219 | 133
> osd.6   19 11429 | 162
> osd.7   16 11521 | 152
> osd.8   15 11725 | 157
> osd.9   13 11723 | 153
> osd.10  13 13316 | 162
> osd.11  14 10521 | 140
> osd.12  11 94 16 | 121
> osd.13  12 11021 | 143
> osd.14  20 11926 | 165
> osd.15  12 12519 | 156
> osd.16  15 12622 | 163
> osd.17  13 10919 | 141
> osd.18  8  11919 | 146
> osd.19  14 11419 | 147
> osd.20  17 11329 | 159
> osd.21  17 11127 | 155
> osd.22  13 12120 | 154
> osd.23  14 95 23 | 132
> osd.24  17 11026 | 153
> osd.25  13 13315 | 161
> osd.26  17 12424 | 165
> osd.27  16 11920 | 155
> osd.28  19 13430 | 183
> osd.29  13 12120 | 154
> osd.30  11 97 20 | 128
> osd.31  12 10918 | 139
> osd.32  10 11215 | 137
> osd.33  18 11428 | 160
> osd.34  19 11229 | 160
> osd.35  16 12132 | 169
> osd.36  13 11118 | 142
> osd.37  15 10722 | 144
> osd.38  21 12924 | 174
> osd.39  9  12117 | 147
> osd.40  11 10218 | 131
> osd.41  14 10119 | 134
> osd.42  16 11925 | 160
> osd.43  12 11813 | 143
> osd.44  17 11425 | 156
> osd.45  11 11415 | 140
> osd.46  12 10716 | 135
> osd.47  15 11123 | 149
> osd.48  14 11520 | 149
> osd.49  9  94 13 | 116
> osd.50  14 11718 | 149
> osd.51  13 11219 | 144
> osd.52  11 12622 | 159
> osd.53  12 12218 | 152
> osd.54  13 12120 | 154
> osd.55  17 11425 | 156
> osd.56  11 11818 | 147
> osd.57  22 13725 | 184
> osd.58  15 10522 | 142
> osd.59  13 12018 | 151
> osd.60  12 11019 | 141
> osd.61  21 11428 | 163
> osd.62  12 97 18 | 127
> osd.63  19 10931 | 159
> osd.64  10 13221 | 163
> osd.65  19 13721 | 177
> osd.66  22 10732 | 161
> osd.67  12 10720 | 139
> osd.68  14 10022 | 136
> osd.69  16 11024 | 150
> osd.70  9  10114 | 124
> osd.71  15 11224 | 151
>
> 
> SUM   : 1024   8192   1536   |
>
> We can found that, for poolid=1(data pool),
> osd.57 and osd.65 both have 137 PGs but osd.12 and osd.49 only have 94 PGs,
> which maybe cause data distribution imbanlance, and reduces the space
> utilization of the cluster.
>
> Use "crushtool -i crush.raw --test --show-mappings --rule 0 --num-rep
> 2 --min-x 1 --max-x %s"
> we tested different pool pg_num:
>
> Total PG num PG num stats
>  ---
> 4096 avg: 113.78 (avg stands for avg PG num of every osd)
> total: 8192  (total stands for total PG num, include replica PG)
> max: 139 +0.221680 (max stands for max PG num on OSD, +0.221680 stands
> for percent above avage PG num )
> min: 113 -0.226562 (min stands for min PG num on OSD, -0.226562 stands
> for ratio below avage PG num )
>
> 8192 avg: 227.56
> total: 16384
> max: 267 0.173340
> min: 226 -0.129883
>
> 16384

[ceph-users] how to improve ceph cluster capacity usage

2015-09-01 Thread huang jun

hi,all

Recently, i did some experiments on OSD data distribution,
we set up a cluster with 72 OSDs,all 2TB sata disk,
and ceph version is v0.94.3 and linux kernel version is 3.18,
and set "ceph osd crush tunables optimal".
There are 3 pools:
pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 302 stripe_width 0
pool 1 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 4096 pgp_num 4096 last_change 832
crash_replay_interval 45 stripe_width 0
pool 2 'metadata' replicated size 3 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 512 pgp_num 512 last_change 302
stripe_width 0

the osd pg num of each osd:
pool  : 0  1  2  | SUM

osd.0   13 10518 | 136
osd.1   17 11026 | 153
osd.2   15 11420 | 149
osd.3   11 10117 | 129
osd.4   8  10617 | 131
osd.5   12 10219 | 133
osd.6   19 11429 | 162
osd.7   16 11521 | 152
osd.8   15 11725 | 157
osd.9   13 11723 | 153
osd.10  13 13316 | 162
osd.11  14 10521 | 140
osd.12  11 94 16 | 121
osd.13  12 11021 | 143
osd.14  20 11926 | 165
osd.15  12 12519 | 156
osd.16  15 12622 | 163
osd.17  13 10919 | 141
osd.18  8  11919 | 146
osd.19  14 11419 | 147
osd.20  17 11329 | 159
osd.21  17 11127 | 155
osd.22  13 12120 | 154
osd.23  14 95 23 | 132
osd.24  17 11026 | 153
osd.25  13 13315 | 161
osd.26  17 12424 | 165
osd.27  16 11920 | 155
osd.28  19 13430 | 183
osd.29  13 12120 | 154
osd.30  11 97 20 | 128
osd.31  12 10918 | 139
osd.32  10 11215 | 137
osd.33  18 11428 | 160
osd.34  19 11229 | 160
osd.35  16 12132 | 169
osd.36  13 11118 | 142
osd.37  15 10722 | 144
osd.38  21 12924 | 174
osd.39  9  12117 | 147
osd.40  11 10218 | 131
osd.41  14 10119 | 134
osd.42  16 11925 | 160
osd.43  12 11813 | 143
osd.44  17 11425 | 156
osd.45  11 11415 | 140
osd.46  12 10716 | 135
osd.47  15 11123 | 149
osd.48  14 11520 | 149
osd.49  9  94 13 | 116
osd.50  14 11718 | 149
osd.51  13 11219 | 144
osd.52  11 12622 | 159
osd.53  12 12218 | 152
osd.54  13 12120 | 154
osd.55  17 11425 | 156
osd.56  11 11818 | 147
osd.57  22 13725 | 184
osd.58  15 10522 | 142
osd.59  13 12018 | 151
osd.60  12 11019 | 141
osd.61  21 11428 | 163
osd.62  12 97 18 | 127
osd.63  19 10931 | 159
osd.64  10 13221 | 163
osd.65  19 13721 | 177
osd.66  22 10732 | 161
osd.67  12 10720 | 139
osd.68  14 10022 | 136
osd.69  16 11024 | 150
osd.70  9  10114 | 124
osd.71  15 11224 | 151


SUM   : 1024   8192   1536   |

We can found that, for poolid=1(data pool),
osd.57 and osd.65 both have 137 PGs but osd.12 and osd.49 only have 94 PGs,
which maybe cause data distribution imbanlance, and reduces the space
utilization of the cluster.

Use "crushtool -i crush.raw --test --show-mappings --rule 0 --num-rep
2 --min-x 1 --max-x %s"
we tested different pool pg_num:

Total PG num PG num stats
 ---
4096 avg: 113.78 (avg stands for avg PG num of every osd)
total: 8192  (total stands for total PG num, include replica PG)
max: 139 +0.221680 (max stands for max PG num on OSD, +0.221680 stands
for percent above avage PG num )
min: 113 -0.226562 (min stands for min PG num on OSD, -0.226562 stands
for ratio below avage PG num )

8192 avg: 227.56
total: 16384
max: 267 0.173340
min: 226 -0.129883

16384 avg: 455.11
total: 32768
max: 502 0.103027
min: 455 -0.127686

32768 avg: 910.22
total: 65536
max: 966 0.061279
min: 910 -0.076050

With bigger pg_num, the gap between the maximum and the minimum decreased.
But it's unreasonable to set such large pg_num, which will increase
OSD and MON load.

Is there any way to get a more balanced PG distribution of the cluster?
We tried "ceph osd reweight-by-pg 110 data" many times, but that can
not resolve the problem.

Another problem is that if we can ensure the PG is distributed
balanced, can we ensure the data
distribution is balanced like PG?

Btw, we will write data to this cluster until one or more osd get
full, we set full ratio to 0.98,
and we expect the cluster can use 0.9 total capacity.

Any tips are welcome.

-- 
thanks
huangjun

Re: [ceph-users] Discuss: New default recovery config settings

2015-06-11 Thread huang jun

hi,jan

2015-06-01 15:43 GMT+08:00 Jan Schermer j...@schermer.cz:
 We had to disable deep scrub or the cluster would me unusable - we need to 
 turn it back on sooner or later, though.
 With minimal scrubbing and recovery settings, everything is mostly good. 
 Turned out many issues we had were due to too few PGs - once we increased 
 them from 4K to 16K everything sped up nicely (because the chunks are 
 smaller), but during heavy activity we are still getting some “slow IOs”.

How many PGs do you set ?  we get slow requests many times, but
didn't relate it to PG number.
And we follow the equation below for every pool:

(OSDs * 100)
Total PGs =  -
  pool size
our cluster has 157 OSDs and 3 POOLs, we set pg_num to  8192 for every pool,
but osd cpu utlity percentage is up to 300% after restart, we think
it's  loading pgs during the period.
and we will try different PG number when we get slow request

thanks!

 I believe there is an ionice knob in newer versions (we still run Dumpling), 
 and that should do the trick no matter how much additional “load” is put on 
 the OSDs.
 Everybody’s bottleneck will be different - we run all flash so disk IO is not 
 a problem but an OSD daemon is - no ionice setting will help with that, it 
 just needs to be faster ;-)

 Jan


 On 30 May 2015, at 01:17, Gregory Farnum g...@gregs42.com wrote:

 On Fri, May 29, 2015 at 2:47 PM, Samuel Just sj...@redhat.com wrote:
 Many people have reported that they need to lower the osd recovery config 
 options to minimize the impact of recovery on client io.  We are talking 
 about changing the defaults as follows:

 osd_max_backfills to 1 (from 10)
 osd_recovery_max_active to 3 (from 15)
 osd_recovery_op_priority to 1 (from 10)
 osd_recovery_max_single_start to 1 (from 5)

 I'm under the (possibly erroneous) impression that reducing the number
 of max backfills doesn't actually reduce recovery speed much (but will
 reduce memory use), but that dropping the op priority can. I'd rather
 we make users manually adjust values which can have a material impact
 on their data safety, even if most of them choose to do so.

 After all, even under our worst behavior we're still doing a lot
 better than a resilvering RAID array. ;)
 -Greg
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
thanks
huangjun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

60 matches

Mail list logo