Re: [ceph-users] Discuss: New default recovery config settings

2015-06-11 Thread huang jun
hi,jan 2015-06-01 15:43 GMT+08:00 Jan Schermer j...@schermer.cz: We had to disable deep scrub or the cluster would me unusable - we need to turn it back on sooner or later, though. With minimal scrubbing and recovery settings, everything is mostly good. Turned out many issues we had were

Re: [ceph-users] how to improve ceph cluster capacity usage

2015-09-02 Thread huang jun
After search the source code, i found ceph_psim tool which can simulate objects distribution, but it seems a little simple. 2015-09-01 22:58 GMT+08:00 huang jun <hjwsm1...@gmail.com>: > hi,all > > Recently, i did some experiments on OSD data distribution, > we set up a cluste

[ceph-users] how to improve ceph cluster capacity usage

2015-09-01 Thread huang jun
hi,all Recently, i did some experiments on OSD data distribution, we set up a cluster with 72 OSDs,all 2TB sata disk, and ceph version is v0.94.3 and linux kernel version is 3.18, and set "ceph osd crush tunables optimal". There are 3 pools: pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset

Re: [ceph-users] How long will the logs be kept?

2015-12-02 Thread huang jun
it will rotate every week by default, you can see the logrotate file /etc/ceph/logrotate.d/ceph 2015-12-03 12:37 GMT+08:00 Wukongming : > Hi ,All > Is there anyone who knows How long or how many days will the logs.gz > (mon/osd/mds)be kept, maybe before flushed? > >

Re: [ceph-users] Confused about priority of client OP.

2015-12-03 Thread huang jun
In SimpleMessenger, the client OP like OSD_OP will dispatch by ms_fast_dispatch, and not queued in PriortizedQueue in Messenger. 2015-12-03 22:14 GMT+08:00 Wukongming : > Hi, All: > I 've got a question about a priority. We defined > osd_client_op_priority = 63.

[ceph-users] cluster_network goes slow during erasure code pool's stress testing

2015-12-21 Thread huang jun
hi,all We meet a problem related to erasure pool with k:m=3:1 and stripe_unit=64k*3. We have a cluster with 96 OSDs on 4 Hosts(hosts are: srv1, srv2, srv3, srv4), each host have 24 OSDs, each host have 12 core processors (Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz) and 48GB memory. cluster

Re: [ceph-users] Calculating PG in an mixed environment

2016-03-15 Thread huang jun
you can find in http://ceph.com/pgcalc/ 2016-03-15 23:41 GMT+08:00 Martin Palma : > Hi all, > > The documentation [0] gives us the following formula for calculating > the number of PG if the cluster is bigger than 50 OSDs: > > (OSDs * 100) > Total PGs =

Re: [ceph-users] chunk-based cache in ceph with erasure coded back-end storage

2016-03-30 Thread huang jun
if your cache-mode is write-back, which will cache the read object in cache tier. you can try the read-proxy mode, which will not cache the object. the read request send to primary OSD, and the primary osd collect the shards from base tier(in you case, is erasure code pool), you need to read at

Re: [ceph-users] chunk-based cache in ceph with erasure coded back-end storage

2016-03-31 Thread huang jun
; > Thanks! > > > > e-Original Message- > From: huang jun <hjwsm1...@gmail.com> > To: Yu Xiang <hellomorn...@luckymail.com> > Cc: ceph-users <ceph-users@lists.ceph.com> > Sent: Wed, Mar 30, 2016 9:04 pm > Subject: Re: [ceph-users] chunk-based cache i

Re: [ceph-users] infernalis and jewel upgrades...

2016-04-15 Thread huang jun
is needed to > interpret this. > > Regards, > Hong > > > On Saturday, April 16, 2016 12:11 AM, hjcho616 <hjcho...@yahoo.com> wrote: > > > Is this it? > > root@OSD2:/var/lib/ceph/osd/ceph-3/current/meta# find ./ | grep osdmap | > grep 16024 > ./DI

Re: [ceph-users] howto delete a pg

2016-04-15 Thread huang jun
for your cluster warning message, it's a pg's some objects have inconsistent in primary and replicas, so you can try 'ceph pg repair $PGID'. 2016-04-16 9:04 GMT+08:00 Oliver Dzombic : > Hi, > > i meant of course > > 0.e6_head > 0.e6_TEMP > > in > >

Re: [ceph-users] infernalis and jewel upgrades...

2016-04-15 Thread huang jun
First, you should check whether file osdmap.16024 exists in your osd.3/current/meta dir, if not, you can copy it from other OSD who has it. 2016-04-16 12:36 GMT+08:00 hjcho616 : > Here is what I get wtih debug_osd = 20. > > 2016-04-15 23:28:24.429063 7f9ca0a5b800 0 set

Re: [ceph-users] infernalis and jewel upgrades...

2016-04-15 Thread huang jun
/ceph-3/current/meta# find ./ | grep osdmap | > grep 16024 > ./DIR_E/DIR_3/inc\uosdmap.16024__0_46887E3E__none > > Regards, > Hong > > > On Friday, April 15, 2016 11:53 PM, huang jun <hjwsm1...@gmail.com> wrote: > > > First, you should check whether fil

Re: [ceph-users] krbd map on Jewel, sysfs write failed when rbd map

2016-04-18 Thread huang jun
Hi, can you post the 'modinfo rbd' and your cluster state 'ceph -s'. 2016-04-18 16:35 GMT+08:00 席智勇 : > hi cephers: > > I create a rbd volume(image) on Jewel release, when exec rbd map, I got the > error message as follows.i can not find any message usage in >

Re: [ceph-users] Bug maybe: osdmap failed undecoded

2017-02-23 Thread huang jun
you can copy the corrupt osdmap file from osd.1 and then restart osd, we met this before, and that works for us. 2017-02-23 22:33 GMT+08:00 tao chang : > HI, > > I have a ceph cluster (ceph 10.2.5) witch 3 node, each has two osds. > > It was a power outage last night and

Re: [ceph-users] ceph journal system vs filesystem journal system

2016-09-01 Thread huang jun
2016-09-01 17:25 GMT+08:00 한승진 : > Hi all. > > I'm very confused about ceph journal system > > Some people said ceph journal system works like linux journal filesystem. > > Also some people said all data are written journal first and then written to > OSD data. > > Journal of

Re: [ceph-users] Does anyone know why cephfs do not support EC pool?

2016-10-17 Thread huang jun
ec only support writefull and append operations, but not partial write, your can try it by doing random writes, see if the osd crash or not. 2016-10-18 10:10 GMT+08:00 Liuxuan : > Hello: > > > > I have create cephfs which data pool type is EC and metadata is replica, > The

Re: [ceph-users] Does anyone know why cephfs do not support EC pool?

2016-10-17 Thread huang jun
com>: > On Mon, Oct 17, 2016 at 9:23 PM, huang jun <hjwsm1...@gmail.com> wrote: > >> ec only support writefull and append operations, but not partial write, >> your can try it by doing random writes, see if the osd crash or not. >> >> 2016-10-18 10:10 GMT+08

Re: [ceph-users] How do I restart node that I've killed in development mode

2016-10-12 Thread huang jun
./init-ceph start mon.a 2016-10-12 14:54 GMT+08:00 agung Laksono : > Hi Ceph Users, > > I deploy development cluster using vstart with 3 MONs and 3 OSDs. > On my experiment, Kill one of the monitor nodes by its pid. like this: > > $ kill -SIGSEGV 27557 > > After a new

Re: [ceph-users] Update crushmap when monitors are down

2019-04-01 Thread huang jun
Can you provide detail error logs when mon crash? Pardhiv Karri 于2019年4月2日周二 上午9:02写道: > > Hi, > > Our ceph production cluster is down when updating crushmap. Now we can't get > out monitors to come online and when they come online for a fraction of a > second we see crush map errors in logs.

Re: [ceph-users] Erasure Pools.

2019-03-31 Thread huang jun
What's the output of 'ceph osd dump' and 'ceph osd crush dump' and 'ceph health detail'? Andrew J. Hutton 于2019年3月30日周六 上午7:05写道: > > I have tried to create erasure pools for CephFS using the examples given > at >

Re: [ceph-users] how to force backfill a pg in ceph jewel

2019-03-31 Thread huang jun
The force-recovery/backfill command was introduced in Luminous version if i remember right Nikhil R 于2019年3月31日周日 上午7:59写道: > > Team, > Is there a way to force backfill a pg in ceph jewel. I know this is available > in mimic. Is it available in ceph jewel > I tried ceph pg backfill pg

Re: [ceph-users] PG stuck in active+clean+remapped

2019-03-31 Thread huang jun
seems like the crush cannot get enough osds for this pg, what the output of 'ceph osd crush dump' and especially the 'tunables' section values? Vladimir Prokofev 于2019年3月27日周三 上午4:02写道: > > CEPH 12.2.11, pool size 3, min_size 2. > > One node went down today(private network interface started

Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9

2019-03-28 Thread huang jun
Did the time really cost on db compact operation? or you can turn on debug_osd=20 to see what happens, what about the disk util during start? Nikhil R 于2019年3月28日周四 下午4:36写道: > > CEPH osd restarts are taking too long a time > below is my ceph.conf > [osd] > osd_compact_leveldb_on_mount = false >

Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9

2019-03-28 Thread huang jun
/nikhilravindra > > > > On Thu, Mar 28, 2019 at 3:58 PM huang jun wrote: >> >> Did the time really cost on db compact operation? >> or you can turn on debug_osd=20 to see what happens, >> what about the disk util during start? >> >> Nikhil R 于2019年3月

Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9

2019-03-29 Thread huang jun
long start time, leveldb compact or filestore split? > in.linkedin.com/in/nikhilravindra > > > > On Fri, Mar 29, 2019 at 6:55 AM huang jun wrote: >> >> It seems like the split settings result the problem, >> what about comment out those settings then see it still

Re: [ceph-users] cluster is not stable

2019-03-13 Thread huang jun
nd the monitor >> receives all beacons from which the osd send out. >> >> But why some osds don't send beacon? >> >> huang jun 于2019年3月13日周三 下午11:02写道: >>> >>> sorry for not make it clearly, you may need to set one of your osd's >>> osd_beac

Re: [ceph-users] cluster is not stable

2019-03-14 Thread huang jun
sorry, the script should be for f in kraken luminous mimic osdmap-prune; do ceph mon feature set $f --yes-i-really-mean-it done huang jun 于2019年3月14日周四 下午2:04写道: > > ok, if this is a **test environment**, you can try > for f in 'kraken,luminous,mimic,osdmap-prune'; do > ceph mon

Re: [ceph-users] cluster is not stable

2019-03-13 Thread huang jun
> 25 obj/sec or 5 MiB/sec > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 > promote_throttle_recalibrate new_prob 1000 > 2019-03-14 12:41:15.722 7f3c27684700 10 osd.5 17032 > promote_throttle_recalibrate actual 0, actual/prob ratio 1, adjusted > new_prob 1000, prob 1000 -

Re: [ceph-users] cluster is not stable

2019-03-14 Thread huang jun
周四 下午1:56写道: > > # ceph mon feature ls > all features > supported: [kraken,luminous,mimic,osdmap-prune] > persistent: [kraken,luminous,mimic,osdmap-prune] > on current monmap (epoch 2) > persistent: [none] > required: [none] > > hua

Re: [ceph-users] cluster is not stable

2019-03-14 Thread huang jun
environment. If everything is fine, I'll use it for > production. > > My cluster is version mimic, should I set all features you listed in the > command? > > Thanks > > huang jun 于2019年3月14日周四 下午2:11写道: >> >> sorry, the script should be >> for f i

Re: [ceph-users] cluster is not stable

2019-03-13 Thread huang jun
can you get the value of osd_beacon_report_interval item? the default is 300, you can set to 60, or maybe turn on debug_ms=1 debug_mon=10 can get more infos. Zhenshi Zhou 于2019年3月13日周三 下午1:20写道: > > Hi, > > The servers are cennected to the same switch. > I can ping from anyone of the servers

Re: [ceph-users] cluster is not stable

2019-03-13 Thread huang jun
> Hi, >> >> I didn't set osd_beacon_report_interval as it must be the default value. >> I have set osd_beacon_report_interval to 60 and debug_mon to 10. >> >> Attachment is the leader monitor log, the "mark-down" operations is at 14:22 >> >> Thanks >> >> hua

Re: [ceph-users] recommendation on ceph pool

2019-03-13 Thread huang jun
tim taler 于2019年3月13日周三 下午11:05写道: > > Hi all, > how are your experiences with different disk sizes in one pool > regarding the overall performance? > I hope someone could shed some light on the following scenario: > > Let's say I mix an equal amount of 2TB and 8TB disks in one pool, > with a

Re: [ceph-users] Cephfs error

2019-03-18 Thread huang jun
Marc Roos 于2019年3月18日周一 上午5:46写道: > > > > > 2019-03-17 21:59:58.296394 7f97cbbe6700 0 -- > 192.168.10.203:6800/1614422834 >> 192.168.10.43:0/1827964483 > conn(0x55ba9614d000 :6800 s=STATE_OPEN pgs=8 cs=1 l=0).fault server, > going to standby > > What does this mean? That means the connection is

Re: [ceph-users] Huge rebalance after rebooting OSD host (Mimic)

2019-05-15 Thread huang jun
do you have osd's crush location changed after reboot? kas 于2019年5月15日周三 下午10:39写道: > > kas wrote: > : Marc, > : > : Marc Roos wrote: > : : Are you sure your osd's are up and reachable? (run ceph osd tree on > : : another node) > : > : They are up, because all three mons see them as

Re: [ceph-users] Can I limit OSD memory usage?

2019-06-08 Thread huang jun
Did you osd oom killed when cluster doing recover/backfill, or just the client io? The configure items you mentioned is for bluestore and the osd memory include many other things, like pglog, you it's important to known do you cluster is dong recover? Sergei Genchev 于2019年6月8日周六 上午5:35写道: > >

Re: [ceph-users] Reweight OSD to 0, why doesn't report degraded if UP set under Pool Size

2019-06-08 Thread huang jun
i think the write data will also write to the osd.4 in this case. bc your osd.4 is not down, so the ceph don't think the pg have some osd down, and it will replicated the data to all osds in actingbackfill set. Tarek Zegar 于2019年6月7日周五 下午10:37写道: > Paul / All > > I'm not sure what warning your

Re: [ceph-users] radosgw dying

2019-06-08 Thread huang jun
From the error message, i'm decline to that 'mon_max_pg_per_osd' was exceed, you can check the value of it, and its default value is 250, so you can at most have 1500pgs(250*6osds), and for replicated pools with size=3, you can have 500pgs for all pools, you already have 448pgs, so the next pool

Re: [ceph-users] balancer module makes OSD distribution worse

2019-06-08 Thread huang jun
what's your 'ceph osd df tree' outputs?does the osd have the expected PGs? Josh Haft 于2019年6月7日周五 下午9:23写道: > > 95% of usage is CephFS. Remaining is split between RGW and RBD. > > On Wed, Jun 5, 2019 at 3:05 PM Gregory Farnum wrote: > > > > I think the mimic balancer doesn't include omap data

Re: [ceph-users] How to see the ldout log?

2019-06-17 Thread huang jun
you should add this to your ceph.conf [client] log file = /var/log/ceph/$name.$pid.log debug client = 20 ?? ?? 于2019年6月18日周二 上午11:18写道: > > I am a student new to cephfs. I want see the ldout log in > /src/client/Client.cc (for example, ldout(cct, 20) << " no cap on " << > dn->inode->vino() <<

Re: [ceph-users] strange osd beacon

2019-06-15 Thread huang jun
osd send osd beacons every 300s, and it's used to let mon know that osd is alive, for some cases, the osd don't have peers, ex, no pools created. Rafał Wądołowski 于2019年6月14日周五 下午12:53写道: > > Hi, > > Is it normal that osd beacon could be without pgs? Like below. This > drive contain data, but I

Re: [ceph-users] problem with degraded PG

2019-06-15 Thread huang jun
can you show us the output of 'ceph osd dump' and 'ceph health detail'? Luk 于2019年6月14日周五 下午8:02写道: > > Hello, > > All kudos are going to friends from Wroclaw, PL :) > > It was as simple as typo... > > There was osd added two times to crushmap due to (this commands where > run over week ago

Re: [ceph-users] Repairing PG inconsistencies — Ceph Documentation - where's the text?

2019-05-17 Thread huang jun
Stuart Longland 于2019年5月18日周六 上午9:26写道: > > On 16/5/19 8:55 pm, Stuart Longland wrote: > > As this is Bluestore, it's not clear what I should do to resolve that, > > so I thought I'd "RTFM" before asking here: > > http://docs.ceph.com/docs/luminous/rados/operations/pg-repair/ > > > > Maybe

Re: [ceph-users] openstack with ceph rbd vms IO/erros

2019-05-17 Thread huang jun
EDH - Manuel Rios Fernandez 于2019年5月17日周五 下午3:23写道: > > Did you check your KVM host RAM usage? > > > > We saw this on host very very loaded with overcommit in RAM causes a random > crash of VM. > > > > As you said for solve must be remounted externaly and fsck. You can prevent > it disabled

Re: [ceph-users] Repairing PG inconsistencies — Ceph Documentation - where's the text?

2019-05-17 Thread huang jun
That may have problem with your disk? Do you check the syslog or demsg log,? From the code, it will return 'read_error' only the read return EIO. So i doubt that your disk have a sector error. Stuart Longland 于2019年5月18日周六 上午9:43写道: > > On 18/5/19 11:34 am, huang jun wrote: > > Stu

Re: [ceph-users] Repairing PG inconsistencies — Ceph Documentation - where's the text?

2019-05-17 Thread huang jun
ok, so i think if you use 'rados -p pool get 7:581d78de:::rbd_data.b48c7238e1f29.1b34:head -o obj' the osd maybe got crashed. Stuart Longland 于2019年5月18日周六 上午10:05写道: > > On 18/5/19 11:56 am, huang jun wrote: > > That may have problem with your disk? > > Do yo

Re: [ceph-users] Does ceph osd reweight-by-xxx work correctly if OSDs aren't of same size?

2019-04-29 Thread huang jun
Yes, 'ceph osd reweight-by-xxx' will use the osd crush-weight(which represent how much data it can hold) to calculate. Igor Podlesny 于2019年4月29日周一 下午2:56写道: > > Say, some nodes have OSDs that are 1.5 times bigger, than other nodes > have, meanwhile weights of all the nodes in question is almost

Re: [ceph-users] Ceph pool EC with overwrite enabled

2019-07-04 Thread huang jun
try: rbd create backup2/teste --size 5T --data-pool ec_pool Fabio Abreu 于2019年7月5日周五 上午1:49写道: > > Hi Everybody, > > I have a doubt about the usability of rbd with EC pool , I tried to use this > in my CentOS lab but I just receive some errors when I try create a rbd > image inside this pool.

Re: [ceph-users] clock skew

2019-04-25 Thread huang jun
mj 于2019年4月25日周四 下午6:34写道: > > Hi all, > > On our three-node cluster, we have setup chrony for time sync, and even > though chrony reports that it is synced to ntp time, at the same time > ceph occasionally reports time skews that can last several hours. > > See for example: > > > root@ceph2:~#

Re: [ceph-users] How RBD tcp connection works

2019-08-19 Thread huang jun
how long do you monitor after r/w finish? there is a configure item named 'ms_connection_idle_timeout' which default value is 900 fengyd 于2019年8月19日周一 下午4:10写道: > > Hi, > > I have a question about tcp connection. > In the test environment, openstack uses ceph RBD as backend storage. > I created

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-09 Thread huang jun
: > > Hi, > > Please find the ceph osd tree output in the pastebin > https://pastebin.com/Gn93rE6w > > On Fri, Nov 8, 2019 at 7:58 PM huang jun wrote: >> >> can you post your 'ceph osd tree' in pastebin? >> do you mean the osds report fsid mismatch is from

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-09 Thread huang jun
f you want > I will send you the logs of the mon once again by restarting the osd.0 > > On Sun, Nov 10, 2019 at 10:17 AM huang jun wrote: >> >> The mon log shows that the all mismatch fsid osds are from node 10.50.11.45, >> maybe that the fith node? >> BTW i don't fo

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-08 Thread huang jun
try to restart some of the down osds in 'ceph osd tree', and to see what happened? nokia ceph 于2019年11月8日周五 下午6:24写道: > > Adding my official mail id > > -- Forwarded message - > From: nokia ceph > Date: Fri, Nov 8, 2019 at 3:57 PM > Subject: OSD's not coming up in Nautilus > To:

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-08 Thread huang jun
:33:05 > cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08 10:33:05.474 7f9ad14df700 -1 > osd.0 1795 set_numa_affinity unable to identify public interface 'dss-client' > numa n...r directory > > Hint: Some lines were ellipsized, use -l to show in full. > > > > > > And

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-09 Thread huang jun
. > > > # ceph tell mon.cn1 injectargs '--debug-mon 1/5' > injectargs: > > cn1.chn8be1c1.cdn ~# ceph daemon /var/run/ceph/ceph-mon.cn1.asok config > show|grep debug_mon > "debug_mon": "1/5", > "debug_monc": "0/0", > > > >

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-08 Thread huang jun
850792e700 -1 osd.0 1795 set_numa_affinity unable to identify public > interface 'dss-client' numa node: (2) No such file or directory > > On Fri, Nov 8, 2019 at 4:48 PM huang jun wrote: >> >> the osd.0 is still in down state after restart? if so, maybe the >> problem is i

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-08 Thread huang jun
sid's are coming from still. Is this creating the problem. Because I am > seeing that the OSD's in the fifth node are showing up in the ceph status > whereas the other nodes osd's are showing down. > > On Fri, Nov 8, 2019 at 7:25 PM huang jun wrote: >> >> I saw many lines like tha

Re: [ceph-users] Cluster in ERR status when rebalancing

2019-12-09 Thread huang jun
what about the pool's backfill_full_ratio value? Simone Lazzaris 于2019年12月9日周一 下午6:38写道: > > Hi all; > > Long story short, I have a cluster of 26 OSD in 3 nodes (8+9+9). One of the > disk is showing some read error, so I''ve added an OSD in the faulty node > (OSD.26) and set the (re)weight of

[ceph-users] gitbuilder.ceph.com service timeout?

2019-12-27 Thread huang jun
Hi, all The apt-mirror.sepia.ceph.com and gitbuilder.ceph.com are down? I can't ping it. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com