[ceph-users] osd used increased much when expand bluestore block lv

2019-10-24 Thread lin zhou
Hi,cephers I try to two expand test in bluestore. First test is expands db lv which osd have separate block and db and wal, after runs ceph-bluestore-tool buefs-bdev-expand --path XX, it works well. perf dump shows the correct db size. Second test is expands block lv which osd do not have

[ceph-users] slow requests after rocksdb delete wal or table_file_deletion

2019-09-26 Thread lin zhou
hi, cephers recenty, I am testing ceph 12.2.12 with bluestore using cosbench. both SATA osd and ssd osd has slow request. many slow request occur, and most slow logs after rocksdb delete wal or table_file_deletion logs does it means the bottleneck of Rocksdb? if so how to improve. if not how to

[ceph-users] s3cmd upload file successed but return This multipart completion is already in progress

2019-09-16 Thread lin zhou
Hi, cephers recently when using s3cmd to upload a large file, last POST request which meaned I have finished mutltipart upload.But in fact file upload success. some key points: 1.s3cmd send the POST request.but the server response spend 30s;why some times this POST request need 30s to finish,

[ceph-users] perf dump and osd perf will cause the performance of ceph if I run it for each service?

2019-09-08 Thread lin zhou
Hi,cephers I want to monitor ceph more metric using perf dump and osd perf. If I run these commands for every service once per minute, will it have an measurable impact on the cluster performance? Thanks ___ ceph-users mailing list

[ceph-users] could not find secret_id--auth to unkown host

2019-06-28 Thread lin zhou
Hi,cephers recently I found auth error logs in most of my osds but not all, escape some nodes I rebooted after the installation. it looks like this osd verify to 10.108.87.250:0 first and then the correct mon.a. 10.108.87.250 now is my radosgw node, maybe I use it as mon in my first

Re: [ceph-users] near 300 pg per osd make cluster very very unstable?

2019-06-22 Thread lin zhou
fault node hang and can not ssh login ,when I set osd nodown and try to set the the osds in the fault node to down to recover 5.then peer pg changed 6.then more nodes hang, begin disappear monitor data, can not ssh 7.all my vms hang all this is just after I ceph osd in. lin zhou 于2019年6月23日周日 上午7

[ceph-users] near 300 pg per osd make cluster very very unstable?

2019-06-22 Thread lin zhou
recently our ceph cluster very unstable, even replace a failed disk may trigger a chain reaction, cause large quantities of osd been wrongly marked down. I am not sure if it is because we have near 300 pgs in each sas osds and small bigger than 300 pgs for ssd osd. from logs, it all starts from

[ceph-users] rbd omap disappeared

2019-04-24 Thread lin zhou
my cluster occur a big error this morning. many osd suicide because of heartbeat_map timeout. when I start all osd manually.it looks well. but when I using rbd info for a rbd from rbd ls, it say not such file or directory. And the I use the way in

[ceph-users] 10.2.10-many osd wrongly marked down and osd log has too much ms_handle_reset

2019-04-22 Thread lin zhou
HI,cephers My ceph cluster faces a new problem. many osd wrongly marked down. each time tens of or hundreds of osd been marked down, many times it became up again soon. sometimes it is not, and I should restart the osd manually because it is blocked by peer. I search the dmesg, I found nothing.

[ceph-users] can not change log level for ceph-client.libvirt online

2019-04-12 Thread lin zhou
Hi, cephers we have a ceph cluster with openstack. maybe long ago, we set debug_rbd in ceph.conf and then boot vm. but these debug config not exist in the config now. Now we find the ceph-client.libvirt.log is 200GB. But I can not using ceph --admin-daemon ceph-client.libvirt.asok config set

Re: [ceph-users] osd exit common/Thread.cc: 160: FAILED assert(ret == 0)--10.2.10

2019-02-28 Thread lin zhou
Thanks Greg. I found the limit. it is /proc/sys/kernel/threads-max. I count thread numbers using: ps -eo nlwp | tail -n +2 | awk '{ num_threads += $1 } END { print num_threads }'" -o 97981 lin zhou 于2019年2月28日周四 上午10:33写道: > Thanks, Greg. Your reply always so fast. > > I c

Re: [ceph-users] osd exit common/Thread.cc: 160: FAILED assert(ret == 0)--10.2.10

2019-02-27 Thread lin zhou
Thanks, Greg. Your reply always so fast. I check my system these limits. # ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals

Re: [ceph-users] jewel10.2.11 EC pool out a osd, its PGs remap to the osds in the same host

2019-02-17 Thread lin zhou
Thanks so much. my ceph osd df tree output is here:https://gist.github.com/hnuzhoulin/e83140168eb403f4712273e3bb925a1c just like the output, based on the reply of David: when I out osd.132, its pg remap to just its host cld-osd12-56-sata. it seems the out do not change Host's weight. but if I out

[ceph-users] how o understand pg full

2016-06-08 Thread lin zhou
Hi,cephers I know osd full.AFAIK,pg is just a logical concept.so what does pg full mean? Thanks. -- hnuzhoul...@gmail.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Decrease the pgs number in cluster

2016-05-22 Thread lin zhou
the main problem is not the pg_num,but some other problem about your network or ceph service AFAIK. can your parse ceph -s ,ceph osd tree, cat ceph.conf ? 2016-05-23 11:52 GMT+08:00 Albert Archer : > So, there is no solution at all ? > > On Sun, May 22, 2016 at 7:01

Re: [ceph-users] free krbd size in ubuntu12.04 in ceph 0.67.9

2016-05-22 Thread lin zhou
2016-05-23 10:31 GMT+08:00 Sharuzzaman Ahmat Raslan <sharuzza...@gmail.com>: > is your service only have one instance? > are your service running on vm? about twenty krbd instance.they are map and mount in three osd machine. > On May 23, 2016 10:23 AM, "lin zhou" <

[ceph-users] free krbd size in ubuntu12.04 in ceph 0.67.9

2016-05-20 Thread lin zhou
Hi,cephers we only using krbd in ceph.and it works well near two yeas.but now I face a size problem. I have 7 nodes with 10 3T osd each.we using ceph 0.67.9 in ubuntu12.04.I know it is too old,but update is beyond my control. and now we use 80% size,so we start to delete historic unneeded

Re: [ceph-users] increase pgnum after adjust reweight osd

2016-04-26 Thread lin zhou
Apr 2016 13:23:04 +0800 lin zhou wrote: > >> Hi,Cephers: >> >> Recently,I face a problem of full.and I have using reweight to adjust it. >> But now I want to increase pgnum before I can add new nodes into the >> cluster. >> > How many more nodes, OSDs? > &

[ceph-users] increase pgnum after adjust reweight osd

2016-04-24 Thread lin zhou
Hi,Cephers: Recently,I face a problem of full.and I have using reweight to adjust it. But now I want to increase pgnum before I can add new nodes into the cluster. current pg_num is 2048,and total OSD is 69.I want to increase to 4096. so what's the recommended steps:one-time increase derectly

Re: [ceph-users] directory hang which mount from a mapped rbd

2016-04-15 Thread lin zhou
le you can open). You should see libceph: osd28 down libceph: osd28 up in the dmesg after the I/O is triggered. Attach # ceph -s # find /sys/kernel/debug/ceph -type f -print -exec cat {} \; when you are done. 2016-04-15 17:42 GMT+08:00 lin zhou <hnuzhoul...@gmail.com>: > some thing goe

Re: [ceph-users] directory hang which mount from a mapped rbd

2016-04-15 Thread lin zhou
Yes,the output is the same. 2016-04-15 16:55 GMT+08:00 Ilya Dryomov <idryo...@gmail.com>: > On Fri, Apr 15, 2016 at 10:32 AM, lin zhou <hnuzhoul...@gmail.com> wrote: >> thanks for so fast reply. >> output in one of the faulty host: >> >> root@musicgci5:~#

Re: [ceph-users] directory hang which mount from a mapped rbd

2016-04-15 Thread lin zhou
.0.899f7.6b8b4567.0470 write /sys/kernel/debug/ceph/409059ba-797e-46da-bc2f-83e3c7779094.client400179/monc have osdmap 32317 want next osdmap 2016-04-15 16:27 GMT+08:00 Ilya Dryomov <idryo...@gmail.com>: > On Fri, Apr 15, 2016 at 10:18 AM, lin zhou <hnuzhoul...@gmail.com> wrote: >> Hi,cep

Re: [ceph-users] maximum numbers of monitor

2016-04-08 Thread lin zhou
intel recommends that deploy three monitors if your nodes within 200. And if the number of your osds is bigger than 100,you should deploy monitor in separate node. 2016-04-08 13:55 GMT+08:00 powerhd : > > hi all: > I have a question about monitor node, what is the maximum

Re: [ceph-users] how to re-add a deleted osd device as a osd with data

2016-03-29 Thread lin zhou
795KB/s, maxb=795KB/s, mint=100032msec, maxt=100032msec Disk stats (read/write): sda: ios=864/28988, merge=0/5738, ticks=31932/1061860, in_queue=1093892, util=99.99% root@node-65:~# the lifetime of this SSD is over. Thanks so much,Christian. 2016-03-30 12:19 GMT+08:00 lin zhou <hnuzhoul...

Re: [ceph-users] an osd which reweight is 0.0 in crushmap has high latency in osd perf

2016-03-29 Thread lin zhou
db4 8:20 0 10.2G 0 part ├─sdb5 8:21 0 10.2G 0 part ├─sdb6 8:22 0 10.2G 0 part ├─sdb7 8:23 0 10.2G 0 part └─sdb8 8:24 0 50.1G 0 part 2016-03-30 11:17 GMT+08:00 lin zhou <hnuzhoul...@gmail.com>: > Hi,ceph

Re: [ceph-users] how to re-add a deleted osd device as a osd with data

2016-03-29 Thread lin zhou
2016-03-29 14:50 GMT+08:00 Christian Balzer <ch...@gol.com>: > > Hello, > > On Tue, 29 Mar 2016 14:00:44 +0800 lin zhou wrote: > >> Hi,Christian. >> When I re-add these OSD(0,3,9,12,15),the high latency occur again.the >> default reweight of these OSD is 0

[ceph-users] an osd which reweight is 0.0 in crushmap has high latency in osd perf

2016-03-29 Thread lin zhou
Hi,cephers some osd has high latency in theoutput of ceph osd perf,but I have setting the reweight of these osd in crushmap tp 0.0 and I use iostat to check these disk,no load. so how does command `ceph osd perf` work? root@node-67:~# ceph osd perf osdid fs_commit_latency(ms)

Re: [ceph-users] how to re-add a deleted osd device as a osd with data

2016-03-29 Thread lin zhou
in these OSD device. 2016-03-29 13:22 GMT+08:00 lin zhou <hnuzhoul...@gmail.com>: > Thanks.I try this method just like ceph document say. > But I just test osd.6 in this way,and the leveldb of osd.6 is > broken.so it can not start. > > When I try this for other osd,it works

[ceph-users] how to re-add a deleted osd device as a osd with data

2016-03-26 Thread lin zhou
Hi,guys. some days ago,one osd have a large latency seeing in ceph osd perf.and this device make this node a high cpu await. So,I delete this osd ad then check this device. But nothing error found. And now I want to re-add this device into cluster with it's data. I try to using ceph-osd to add

Re: [ceph-users] dealing with the full osd / help reweight

2016-03-25 Thread lin zhou
Yeah,I think the main reason is the setting of pg_num and pgp_num of some key pool. This site will tell you the correct value:http://ceph.com/pgcalc/ Before you adjust pg_num and pgp_num,if this is a product environment,you should set as Christian Balzer said: ---  osd_max_backfills = 1 

Re: [ceph-users] dependency of ceph_objectstore_tool in unhealthy ceph0.80.7 in ubuntu12.04

2016-03-24 Thread lin zhou
the cluster to HEALTH_OK status first and then upgrade. > > 2016-03-22 9:37 GMT+08:00 lin zhou <hnuzhoul...@gmail.com>: >> >> Hi, >> >> I want to using ceph_objectstore_tool to export a pg from an OSD which >> has been delete from cluster just as >> https://c

Re: [ceph-users] Any suggestion to deal with slow request?

2016-03-21 Thread lin zhou
I face the same problem. my osd.7 occur slow request,and many pg has a stat of active+recovery_wait. I checked network and the device of osd.7,no errors. Have you solve your problem ? 2016-01-08 13:06 GMT+08:00 Christian Balzer : > > Hello, > > > On Fri, 8 Jan 2016 12:22:04

[ceph-users] dependency of ceph_objectstore_tool in unhealthy ceph0.80.7 in ubuntu12.04

2016-03-21 Thread lin zhou
Hi, I want to using ceph_objectstore_tool to export a pg from an OSD which has been delete from cluster just as https://ceph.com/community/incomplete-pgs-oh-my/ do. my ceph version is 0.80.7,and ceph_objectstore_tool has a dependency of libgoogle-perftools0. But libgoogle-perftools4 has been

[ceph-users] object unfound before finish backfill, up set diff from acting set

2016-03-21 Thread lin zhou
Hi,guys my cluster face a network problem so it occur some error.after solve network problem. latency of some osds in one node is high,using ceph osd perf,which come to 3000+ so I delete this osd from cluster,keep osd data device. after recover and backfill,then I face the problem describe in

[ceph-users] Fwd: object unfound before backfill

2016-03-21 Thread lin zhou
Hi,guys my cluster face a network problem so it occur some error.after solve network problem. latency of some osds in one node is high,using ceph osd perf,which come to 3000+ so I delete this osd from cluster,keep osd data device. after recover and backfill,then I face the problem describe in

[ceph-users] object unfound before backfill

2016-03-21 Thread lin zhou
Hi,guys my cluster face a network problem so it occur some error.after solve network problem. latency of some osds in one node is high,using ceph osd perf,which come to 3000+ so I delete this osd from cluster,keep osd data device. after recover and backfill,then I face the problem describe in

Re: [ceph-users] recommendations for file sharing

2015-12-16 Thread lin zhou 周林
seafile is another way.it support write data to ceph using librados directly. 在 2015年12月15日 10:51, Wido den Hollander 写道: > Are you sure you need file sharing? ownCloud for example now has native > RADOS support using phprados. > > Isn't ownCloud something that could work? Talking native RADOS is

[ceph-users] after a reboot, osd can not up because of leveldb Corruption

2015-10-09 Thread lin zhou 周林
hi,guys the mon and osds in one of a node in our ceph cluster can not up now because of leveldb. ceph 0.80.7 ubuntu12.04 osd log is: -- 2015-10-10 11:12:58.896724 7f4cfcf9d7c0 -1 ESC[0;31m ** ERROR: error converting store

Re: [ceph-users] radosgw-agent, sync zone_info.us-east: Http error code 500 content

2013-12-16 Thread lin zhou 周林
Thanks for your reply. root@rceph0:~# radosgw-admin zone get --name client.radosgw.us-west-1 { domain_root: .us-west.rgw.root, control_pool: .us-west.rgw.control, gc_pool: .us-west.rgw.gc, log_pool: .us-west.log, intent_log_pool: .us-west.intent-log, usage_log_pool: .us-west.usage,