Re: [ceph-users] ceph write performance issue
I used 2 copies, not 3, so should be 1000MB/s in theory. thanks. 2016-09-29 17:54 GMT+08:00 Nick Fisk <n...@fisk.me.uk>: > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf > Of *min fang > *Sent:* 29 September 2016 10:34 > *To:* ceph-users <ceph-users@lists.ceph.com> > *Subject:* [ceph-users] ceph write performance issue > > > > Hi, I created 40 osds ceph cluster with 8 PM863 960G SSD as journal. One > ssd is used by 5 osd drives as journal. The ssd 512 random write > performance is about 450MB/s, but the whole cluster sequential write > throughput is only 800MB/s. Any suggestion on improving sequential write > performance? thanks. > > > > Take a conservative figure of 50MB/s for each disk as writing in Ceph is > not just straight sequential writes, there is a slight random nature to it. > > (40x50MB/s)/3 = 666MB/s. Seems fine to me. > > > > > Testing result is here: > rados bench -p libvirt-pool 10 write --no-cleanup > Maintaining 16 concurrent writes of 4194304 bytes to objects of size > 4194304 for up to 10 seconds or 0 objects > Object prefix: benchmark_data_redpower-sh-04_16462 > sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg > lat(s) > 0 0 0 0 0 0 > - 0 > 1 15 189 174 695.968 696 0.0359122 > 0.082477 > 2 16 395 379 757.938 820 0.0634079 > 0.0826266 > 3 16 582 566 754.601 748 0.0401129 > 0.0830207 > 4 16 796 780 779.934 856 0.0374938 > 0.0816794 > 5 16 977 961 768.735 724 0.0489886 > 0.0827479 > 6 16 1172 1156 770.601 780 0.0428639 > 0.0812062 > 7 16 1387 1371 783.362 860 0.0461826 > 0.0811803 > 8 16 1545 1529 764.433 6320.238497 > 0.0831018 > 9 16 1765 1749 777.265 880 0.0557358 > 0.0814399 >10 16 1971 1955 781.931 824 0.0321333 > 0.0814144 > Total time run: 10.044813 > Total writes made: 1972 > Write size: 4194304 > Object size:4194304 > Bandwidth (MB/sec): 785.281 > Stddev Bandwidth: 80.8235 > Max bandwidth (MB/sec): 880 > Min bandwidth (MB/sec): 632 > Average IOPS: 196 > Stddev IOPS:20 > Max IOPS: 220 > Min IOPS: 158 > Average Latency(s): 0.081415 > Stddev Latency(s): 0.0554568 > Max latency(s): 0.345111 > Min latency(s): 0.0230153 > > my ceph osd configuration: > sd_mkfs_type = xfs > osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k > osd_mkfs_options_xfs = -f -i size=2048 > filestore_max_inline_xattr_size = 254 > filestore_max_inline_xattrs = 6 > osd_op_threads = 20 > filestore_queue_max_ops = 25000 > journal_max_write_entries=1 > journal_queue_max_ops=5 > objecter_inflight_ops=10240 > filestore_queue_max_bytes=1048576000 > filestore_queue_committing_max_bytes =1048576000 > journal_max_write_bytes=1073714824 > journal_queue_max_bytes=1048576 > ms_dispatch_throttle_bytes=1048576000 > objecter_infilght_op_bytes=1048576000 > filestore_max_sync_interval=20 > filestore_flusher=false > filestore_flush_min=0 > filestore_sync_flush=true > journal_block_align = true > journal_dio = true > journal_aio = true > journal_force_aio = true > osd_op_num_shards=8 > osd_op_num_threads_per_shard=2 > filestore_wbthrottle_enable=false > filestore_fd_cache_size=1024 > filestore_omap_header_cache_size=1024 > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph write performance issue
Hi, I created 40 osds ceph cluster with 8 PM863 960G SSD as journal. One ssd is used by 5 osd drives as journal. The ssd 512 random write performance is about 450MB/s, but the whole cluster sequential write throughput is only 800MB/s. Any suggestion on improving sequential write performance? thanks. Testing result is here: rados bench -p libvirt-pool 10 write --no-cleanup Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects Object prefix: benchmark_data_redpower-sh-04_16462 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 15 189 174 695.968 696 0.0359122 0.082477 2 16 395 379 757.938 820 0.0634079 0.0826266 3 16 582 566 754.601 748 0.0401129 0.0830207 4 16 796 780 779.934 856 0.0374938 0.0816794 5 16 977 961 768.735 724 0.0489886 0.0827479 6 16 1172 1156 770.601 780 0.0428639 0.0812062 7 16 1387 1371 783.362 860 0.0461826 0.0811803 8 16 1545 1529 764.433 6320.238497 0.0831018 9 16 1765 1749 777.265 880 0.0557358 0.0814399 10 16 1971 1955 781.931 824 0.0321333 0.0814144 Total time run: 10.044813 Total writes made: 1972 Write size: 4194304 Object size:4194304 Bandwidth (MB/sec): 785.281 Stddev Bandwidth: 80.8235 Max bandwidth (MB/sec): 880 Min bandwidth (MB/sec): 632 Average IOPS: 196 Stddev IOPS:20 Max IOPS: 220 Min IOPS: 158 Average Latency(s): 0.081415 Stddev Latency(s): 0.0554568 Max latency(s): 0.345111 Min latency(s): 0.0230153 my ceph osd configuration: sd_mkfs_type = xfs osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k osd_mkfs_options_xfs = -f -i size=2048 filestore_max_inline_xattr_size = 254 filestore_max_inline_xattrs = 6 osd_op_threads = 20 filestore_queue_max_ops = 25000 journal_max_write_entries=1 journal_queue_max_ops=5 objecter_inflight_ops=10240 filestore_queue_max_bytes=1048576000 filestore_queue_committing_max_bytes =1048576000 journal_max_write_bytes=1073714824 journal_queue_max_bytes=1048576 ms_dispatch_throttle_bytes=1048576000 objecter_infilght_op_bytes=1048576000 filestore_max_sync_interval=20 filestore_flusher=false filestore_flush_min=0 filestore_sync_flush=true journal_block_align = true journal_dio = true journal_aio = true journal_force_aio = true osd_op_num_shards=8 osd_op_num_threads_per_shard=2 filestore_wbthrottle_enable=false filestore_fd_cache_size=1024 filestore_omap_header_cache_size=1024 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph pg level IO sequence
Hi, as my understanding, in PG level, IOs are execute in a sequential way, such as the following cases: Case 1: Write A, Write B, Write C to the same data area in a PG --> A Committed, then B committed, then C. The final data will from write C. Impossible that mixed (A, B,C) data is in the data area. Case 2: Write A, Write B, Read C to the same data area in a PG-> Read C will return the data from Write B, not Write A. Are the above cases true? thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] stuck unclean since forever
Thanks, actually I create a pool with more pgs also meet this problem. Following is my crush map, please help point how to change the crush ruleset? thanks. #begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable chooseleaf_vary_r 1 tunable straw_calc_version 1 # devices device 0 device0 device 1 device1 device 2 osd.2 device 3 osd.3 device 4 osd.4 # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 region type 10 root # buckets host redpower-ceph-01 { id -2 # do not change unnecessarily # weight 3.000 alg straw hash 0 # rjenkins1 item osd.2 weight 1.000 item osd.3 weight 1.000 item osd.4 weight 1.000 } root default { id -1 # do not change unnecessarily # weight 3.000 alg straw hash 0 # rjenkins1 item redpower-ceph-01 weight 3.000 } # rules rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map 2016-06-22 18:27 GMT+08:00 Burkhard Linke < burkhard.li...@computational.bio.uni-giessen.de>: > Hi, > > On 06/22/2016 12:10 PM, min fang wrote: > > Hi, I created a new ceph cluster, and create a pool, but see "stuck > unclean since forever" errors happen(as the following), can help point out > the possible reasons for this? thanks. > > ceph -s > cluster 602176c1-4937-45fc-a246-cc16f1066f65 > health HEALTH_WARN > 8 pgs degraded > 8 pgs stuck unclean > 8 pgs undersized > too few PGs per OSD (2 < min 30) > monmap e1: 1 mons at {ceph-01=172.0.0.11:6789/0} > election epoch 14, quorum 0 ceph-01 > osdmap e89: 3 osds: 3 up, 3 in > flags > pgmap v310: 8 pgs, 1 pools, 0 bytes data, 0 objects > 60112 MB used, 5527 GB / 5586 GB avail >8 active+undersized+degraded > > > *snipsnap* > > With three OSDs and a single host you need to change the crush ruleset for > the pool, since it tries to distribute the data across 3 different _host_ > by default. > > Regards, > Burkhard > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] stuck unclean since forever
Hi, I created a new ceph cluster, and create a pool, but see "stuck unclean since forever" errors happen(as the following), can help point out the possible reasons for this? thanks. ceph -s cluster 602176c1-4937-45fc-a246-cc16f1066f65 health HEALTH_WARN 8 pgs degraded 8 pgs stuck unclean 8 pgs undersized too few PGs per OSD (2 < min 30) monmap e1: 1 mons at {ceph-01=172.0.0.11:6789/0} election epoch 14, quorum 0 ceph-01 osdmap e89: 3 osds: 3 up, 3 in flags pgmap v310: 8 pgs, 1 pools, 0 bytes data, 0 objects 60112 MB used, 5527 GB / 5586 GB avail 8 active+undersized+degraded ceph health detail HEALTH_WARN 8 pgs degraded; 8 pgs stuck unclean; 8 pgs undersized; too few PGs per OSD (2 < min 30) pg 5.0 is stuck unclean since forever, current state active+undersized+degraded, last acting [3] pg 5.1 is stuck unclean since forever, current state active+undersized+degraded, last acting [3] pg 5.2 is stuck unclean since forever, current state active+undersized+degraded, last acting [3] pg 5.3 is stuck unclean since forever, current state active+undersized+degraded, last acting [4] pg 5.7 is stuck unclean since forever, current state active+undersized+degraded, last acting [3] pg 5.6 is stuck unclean since forever, current state active+undersized+degraded, last acting [2] pg 5.5 is stuck unclean since forever, current state active+undersized+degraded, last acting [4] pg 5.4 is stuck unclean since forever, current state active+undersized+degraded, last acting [4] pg 5.7 is active+undersized+degraded, acting [3] pg 5.6 is active+undersized+degraded, acting [2] pg 5.5 is active+undersized+degraded, acting [4] pg 5.4 is active+undersized+degraded, acting [4] pg 5.3 is active+undersized+degraded, acting [4] pg 5.2 is active+undersized+degraded, acting [3] pg 5.1 is active+undersized+degraded, acting [3] pg 5.0 is active+undersized+degraded, acting [3] too few PGs per OSD (2 < min 30) ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 3.0 root default -2 3.0 host ceph-01 2 1.0 osd.2 up 1.0 1.0 3 1.0 osd.3 up 1.0 1.0 4 1.0 osd.4 up 1.0 1.0 ceph osd crush tree [ { "id": -1, "name": "default", "type": "root", "type_id": 10, "items": [ { "id": -2, "name": "ceph-01", "type": "host", "type_id": 1, "items": [ { "id": 2, "name": "osd.2", "type": "osd", "type_id": 0, "crush_weight": 1.00, "depth": 2 }, { "id": 3, "name": "osd.3", "type": "osd", "type_id": 0, "crush_weight": 1.00, "depth": 2 }, { "id": 4, "name": "osd.4", "type": "osd", "type_id": 0, "crush_weight": 1.00, "depth": 2 } ] } ] } ] ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] librbd compatibility
Hi, is there a document describing librbd compatibility? For example, something like this: librbd from Ceph 0.88 can also be applied to 0.90,0.91.. I hope not keep librbd relative stable, so can avoid more code iteration and testing. Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] performance drop a lot when running fio mix read/write
Hi, I run randow fio with rwmixread=70, and found read iops is 707, write is 303. (reference the following). This value is less than random write and read value. The 4K random write IOPs is 529 and 4k randread IOPs is 11343. Apart from rw type is different, other parameters are all same. I do not understand why mix write and read will have so huge impact on performance. All random IOs. thanks. fio -filename=/dev/rbd2 -direct=1 -iodepth 64 -thread -rw=randrw -rwmixread=70 -ioengine=libaio -bs=4k -size=100G -numjobs=1 -runtime=1000 -group_reporting -name=mytest1 mytest1: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64 fio-2.8 time 7423 cycles_start=1062103697308843 Starting 1 thread Jobs: 1 (f=1): [m(1)] [100.0% done] [2144KB/760KB/0KB /s] [536/190/0 iops] [eta 00m:00s] mytest1: (groupid=0, jobs=1): err= 0: pid=7425: Sat Apr 30 08:55:14 2016 read : io=2765.2MB, bw=2830.5KB/s, iops=707, runt=1000393msec slat (usec): min=2, max=268, avg= 8.93, stdev= 4.17 clat (usec): min=203, max=1939.9K, avg=34039.43, stdev=93674.48 lat (usec): min=207, max=1939.9K, avg=34048.93, stdev=93674.50 clat percentiles (usec): | 1.00th=[ 516], 5.00th=[ 836], 10.00th=[ 1112], 20.00th=[ 1448], | 30.00th=[ 1736], 40.00th=[ 6944], 50.00th=[13376], 60.00th=[17280], | 70.00th=[21888], 80.00th=[30848], 90.00th=[49920], 95.00th=[103936], | 99.00th=[552960], 99.50th=[675840], 99.90th=[880640], 99.95th=[954368], | 99.99th=[1105920] bw (KB /s): min= 350, max= 5944, per=100.00%, avg=2837.77, stdev=1272.84 write: io=1184.8MB, bw=1212.8KB/s, iops=303, runt=1000393msec slat (usec): min=2, max=310, avg= 9.35, stdev= 4.50 clat (msec): min=5, max=2210, avg=131.60, stdev=226.47 lat (msec): min=5, max=2210, avg=131.61, stdev=226.47 clat percentiles (msec): | 1.00th=[9], 5.00th=[ 13], 10.00th=[ 15], 20.00th=[ 20], | 30.00th=[ 25], 40.00th=[ 34], 50.00th=[ 44], 60.00th=[ 61], | 70.00th=[ 84], 80.00th=[ 125], 90.00th=[ 449], 95.00th=[ 709], | 99.00th=[ 1037], 99.50th=[ 1139], 99.90th=[ 1369], 99.95th=[ 1450], | 99.99th=[ 1663] bw (KB /s): min= 40, max= 2562, per=100.00%, avg=1215.62, stdev=564.19 lat (usec) : 250=0.01%, 500=0.60%, 750=1.94%, 1000=2.95% lat (msec) : 2=18.69%, 4=2.46%, 10=4.21%, 20=22.05%, 50=26.40% lat (msec) : 100=9.65%, 250=4.64%, 500=2.76%, 750=2.13%, 1000=1.11% lat (msec) : 2000=0.39%, >=2000=0.01% cpu : usr=0.83%, sys=1.47%, ctx=971080, majf=0, minf=1 IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued: total=r=707885/w=303294/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: io=2765.2MB, aggrb=2830KB/s, minb=2830KB/s, maxb=2830KB/s, mint=1000393msec, maxt=1000393msec WRITE: io=1184.8MB, aggrb=1212KB/s, minb=1212KB/s, maxb=1212KB/s, mint=1000393msec, maxt=1000393msec Disk stats (read/write): rbd2: ios=707885/303293, merge=0/0, ticks=24085792/39904840, in_queue=64045864, util=100.00% ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cache tier
thanks Oliver, does the journal need be committed twice? One is for write IO to the cache tier? the other is for write IO destaged to SATA backend pool? 2016-04-21 19:38 GMT+08:00 Oliver Dzombic <i...@ip-interactive.de>: > Hi, > > afaik cache does not have to do anything with journals. > > So your OSD's need journals, and for performance, you will take SSD's. > > The Cache should be something faster than your OSD's. Usually SSD or NVMe. > > The Cache is an extra Space in front of your OSD's which is supposed to > speed up things because the cache operate faster than the OSD's and the > cache will flush its content to the slower OSD's. > > So, the cache is independent from the OSD's. And this way, from their > journals. > > And, the cache >must< be on faster drives than the OSD's, otherwise you > wont see any performance increase. > > -- > Mit freundlichen Gruessen / Best regards > > Oliver Dzombic > IP-Interactive > > mailto:i...@ip-interactive.de > > Anschrift: > > IP Interactive UG ( haftungsbeschraenkt ) > Zum Sonnenberg 1-3 > 63571 Gelnhausen > > HRB 93402 beim Amtsgericht Hanau > Geschäftsführung: Oliver Dzombic > > Steuer Nr.: 35 236 3622 1 > UST ID: DE274086107 > > > Am 21.04.2016 um 13:27 schrieb min fang: > > Hi, my ceph cluster has two pools, ssd cache tier pool and SATA backend > > pool. For this configuration, do I need use SSD as journal device? I do > > not know whether cache tier take the journal role? thanks > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cache tier
Hi, my ceph cluster has two pools, ssd cache tier pool and SATA backend pool. For this configuration, do I need use SSD as journal device? I do not know whether cache tier take the journal role? thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph rbd object write is atomic?
Thanks Jason, yes, I also do not think they can guarantee atomic in extent level. But for a stripe unit in a object, can the atomic write be guaranteed? thanks. 2016-04-06 19:53 GMT+08:00 Jason Dillaman <dilla...@redhat.com>: > It's possible for a write to span one or more blocks -- it just depends on > the write address/size and the RBD image layout (object size, "fancy" > striping, etc). Regardless, however, RBD cannot provide any ordering > guarantees when two clients are writing to the same image at the same > extent. To safely use two or more clients concurrently on the same image > you need a clustering filesystem on top of RBD (e.g. GFS2) or the > application needs to provide its own coordination to avoid concurrent > writes to the same image extents. > > -- > > Jason Dillaman > > > - Original Message - > > > From: "min fang" <louisfang2...@gmail.com> > > To: "ceph-users" <ceph-users@lists.ceph.com> > > Sent: Tuesday, April 5, 2016 10:11:10 PM > > Subject: [ceph-users] ceph rbd object write is atomic? > > > Hi, as my understanding, ceph rbd image will be divided into multiple > objects > > based on LBA address. > > > My question here is: > > > if two clients write to the same LBA address, such as client A write > "" > > to LBA 0x123456, client B write "" to the same LBA. > > > LBA address and data will only be in an object, not cross two objects. > > > Will ceph guarantee object data must be "" or ""? "aabb", "bbaa" > will > > not happen even in a stripe data layout model? > > > thanks. > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph rbd object write is atomic?
Hi, as my understanding, ceph rbd image will be divided into multiple objects based on LBA address. My question here is: if two clients write to the same LBA address, such as client A write "" to LBA 0x123456, client B write "" to the same LBA. LBA address and data will only be in an object, not cross two objects. Will ceph guarantee object data must be "" or ""? "aabb", "bbaa" will not happen even in a stripe data layout model? thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] osd up_from, up_thru
Dear, I used osd dump to extract osd monmap, and found up_from, up_thru list, what is the difference between up_from and up_thru? osd.0 up in weight 1 up_from 673 up_thru 673 down_at 670 last_clean_interval [637,669) Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd cache did not help improve performance
thanks, with your help, I set the read ahead parameter. What is the cache parameter for kernel module rbd? Such as: 1) what is the cache size? 2) Does it support write back? 3) Will read ahead be disabled if max bytes has been read into cache? (similar the concept as "rbd_readahead_disable_after_bytes". thanks again. 2016-03-01 21:31 GMT+08:00 Adrien Gillard <gillard.adr...@gmail.com>: > As Tom stated, RBD cache only works if your client is using librbd (KVM > clients for instance). > Using the kernel RBD client, one of the parameter you can tune to optimize > sequential read is increasing /sys/class/block/rbd4/queue/read_ahead_kb > > Adrien > > > > On Tue, Mar 1, 2016 at 12:48 PM, min fang <louisfang2...@gmail.com> wrote: > >> I can use the following command to change parameter, for example as the >> following, but not sure whether it will work. >> >> ceph --admin-daemon /var/run/ceph/ceph-mon.openpower-0.asok config set >> rbd_readahead_disable_after_bytes 0 >> >> 2016-03-01 15:07 GMT+08:00 Tom Christensen <pav...@gmail.com>: >> >>> If you are mapping the RBD with the kernel driver then you're not using >>> librbd so these settings will have no effect I believe. The kernel driver >>> does its own caching but I don't believe there are any settings to change >>> its default behavior. >>> >>> >>> On Mon, Feb 29, 2016 at 9:36 PM, Shinobu Kinjo <ski...@redhat.com> >>> wrote: >>> >>>> You may want to set "ioengine=rbd", I guess. >>>> >>>> Cheers, >>>> >>>> - Original Message - >>>> From: "min fang" <louisfang2...@gmail.com> >>>> To: "ceph-users" <ceph-users@lists.ceph.com> >>>> Sent: Tuesday, March 1, 2016 1:28:54 PM >>>> Subject: [ceph-users] rbd cache did not help improve performance >>>> >>>> Hi, I set the following parameters in ceph.conf >>>> >>>> [client] >>>> rbd cache=true >>>> rbd cache size= 25769803776 >>>> rbd readahead disable after byte=0 >>>> >>>> >>>> map a rbd image to a rbd device then run fio testing on 4k read as the >>>> command >>>> ./fio -filename=/dev/rbd4 -direct=1 -iodepth 64 -thread -rw=read >>>> -ioengine=aio -bs=4K -size=500G -numjobs=32 -runtime=300 -group_reporting >>>> -name=mytest2 >>>> >>>> Compared the result with setting rbd cache=false and enable cache >>>> model, I did not see performance improved by librbd cache. >>>> >>>> Is my setting not right, or it is true that ceph librbd cache will not >>>> have benefit on 4k seq read? >>>> >>>> thanks. >>>> >>>> >>>> ___ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> ___ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> >>> >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd cache did not help improve performance
I can use the following command to change parameter, for example as the following, but not sure whether it will work. ceph --admin-daemon /var/run/ceph/ceph-mon.openpower-0.asok config set rbd_readahead_disable_after_bytes 0 2016-03-01 15:07 GMT+08:00 Tom Christensen <pav...@gmail.com>: > If you are mapping the RBD with the kernel driver then you're not using > librbd so these settings will have no effect I believe. The kernel driver > does its own caching but I don't believe there are any settings to change > its default behavior. > > > On Mon, Feb 29, 2016 at 9:36 PM, Shinobu Kinjo <ski...@redhat.com> wrote: > >> You may want to set "ioengine=rbd", I guess. >> >> Cheers, >> >> - Original Message - >> From: "min fang" <louisfang2...@gmail.com> >> To: "ceph-users" <ceph-users@lists.ceph.com> >> Sent: Tuesday, March 1, 2016 1:28:54 PM >> Subject: [ceph-users] rbd cache did not help improve performance >> >> Hi, I set the following parameters in ceph.conf >> >> [client] >> rbd cache=true >> rbd cache size= 25769803776 >> rbd readahead disable after byte=0 >> >> >> map a rbd image to a rbd device then run fio testing on 4k read as the >> command >> ./fio -filename=/dev/rbd4 -direct=1 -iodepth 64 -thread -rw=read >> -ioengine=aio -bs=4K -size=500G -numjobs=32 -runtime=300 -group_reporting >> -name=mytest2 >> >> Compared the result with setting rbd cache=false and enable cache model, >> I did not see performance improved by librbd cache. >> >> Is my setting not right, or it is true that ceph librbd cache will not >> have benefit on 4k seq read? >> >> thanks. >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd cache did not help improve performance
Hi, I set the following parameters in ceph.conf [client] rbd cache=true rbd cache size= 25769803776 rbd readahead disable after byte=0 map a rbd image to a rbd device then run fio testing on 4k read as the command ./fio -filename=/dev/rbd4 -direct=1 -iodepth 64 -thread -rw=read -ioengine=aio -bs=4K -size=500G -numjobs=32 -runtime=300 -group_reporting -name=mytest2 Compared the result with setting rbd cache=false and enable cache model, I did not see performance improved by librbd cache. Is my setting not right, or it is true that ceph librbd cache will not have benefit on 4k seq read? thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph random read performance is better than sequential read?
Hi, I did a fio testing on my ceph cluster, and found ceph random read performance is better than sequential read. Is it true in your stand? Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] can rbd block_name_prefix be changed?
Hi, can rbd block_name_prefix be changed? Is it constant for a rbd image? thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Read IO to object while new data still in journal
thanks, so ceph can guarantee after write commit call back, read IO can get the new written data, right? 2015-12-31 10:55 GMT+08:00 Zhi Zhang <zhang.david2...@gmail.com>: > If the data has not been written to filestore, as you mentioned, it is > still in journal, your following read op will be blocked until the > data is written to filestore. > > This is because when writing this data, the related object context > will hold ondisk_write_lock. This lock will be released in a callback > after data is in filestore. When ondisk_write_lock is held, read op to > this data will be blocked. > > > Regards, > Zhi Zhang (David) > Contact: zhang.david2...@gmail.com > zhangz.da...@outlook.com > > > On Thu, Dec 31, 2015 at 10:33 AM, min fang <louisfang2...@gmail.com> > wrote: > > yes, the question here is, librbd use the committed callback, as my > > understanding, when this callback returned, librbd write will be looked > as > > completed. So I can issue a read IO even if the data is not readable. In > > this case, i would like to know what data will be returned for the read > IO? > > > > 2015-12-31 10:29 GMT+08:00 Dong Wu <archer.wud...@gmail.com>: > >> > >> there are two callbacks: committed and applied, committed means write > >> to all replica's journal, applied means write to all replica's file > >> system. so when applied callback return to client, it means data can > >> be read. > >> > >> 2015-12-31 10:15 GMT+08:00 min fang <louisfang2...@gmail.com>: > >> > Hi, as my understanding, write IO will committed data to journal > >> > firstly, > >> > then give a safe callback to ceph client. So it is possible that data > >> > still > >> > in journal when I send a read IO to the same area. So what data will > be > >> > returned if the new data still in journal? > >> > > >> > Thanks. > >> > > >> > ___ > >> > ceph-users mailing list > >> > ceph-users@lists.ceph.com > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > > > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Read IO to object while new data still in journal
yes, the question here is, librbd use the committed callback, as my understanding, when this callback returned, librbd write will be looked as completed. So I can issue a read IO even if the data is not readable. In this case, i would like to know what data will be returned for the read IO? 2015-12-31 10:29 GMT+08:00 Dong Wu <archer.wud...@gmail.com>: > there are two callbacks: committed and applied, committed means write > to all replica's journal, applied means write to all replica's file > system. so when applied callback return to client, it means data can > be read. > > 2015-12-31 10:15 GMT+08:00 min fang <louisfang2...@gmail.com>: > > Hi, as my understanding, write IO will committed data to journal firstly, > > then give a safe callback to ceph client. So it is possible that data > still > > in journal when I send a read IO to the same area. So what data will be > > returned if the new data still in journal? > > > > Thanks. > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Read IO to object while new data still in journal
Hi, as my understanding, write IO will committed data to journal firstly, then give a safe callback to ceph client. So it is possible that data still in journal when I send a read IO to the same area. So what data will be returned if the new data still in journal? Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ubuntu 14.04 or centos 7
Hi, I am looking for OS for my ceph cluster, from http://docs.ceph.com/docs/master/start/os-recommendations/#infernalis-9-1-0, there are two OS has been fully tested, centos 7 and ubuntu 14.04. Which one is better? thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Configure Ceph client network
Hi, I have a 2 port 10Gb NIC installed in ceph client, but I just want to use one NIC port to do ceph IO. The other port in the NIC will be reserved to other purpose. Does currently ceph support to choose NIC port to do IO? Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to configure ceph client network
Hi, I have a 2 port 10Gb NIC installed in ceph client, but I just want to use open NIC port to do ceph IO. How can I achieve this? Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rados_aio_cancel
Is this function used in detach rx buffer, and complete IO back to the caller? From the code, I think this function will not interact with OSD or MON side, which means, we just cancel IO from client side. Am I right? Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph object mining
Hi, I setup ceph cluster for storing pictures. I want to introduce a data mining program in ceph osd nodes to dig objects with concrete properties. I hope some kind of map-reduce framework can use ceph object interface directly,while not using posix file system interface. Can somebody help give some suggestion? Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph object mining
Thanks Gregory. I find intel introduced a technology for running hadoop on rgw direclty(http://www.slideshare.net/zhouyuan/hadoop-over-rgw), but I think it is complex for my usage. There is another technology storlet( http://ibmresearchnews.blogspot.co.uk/2014/05/storlets-turning-object-storage-into.html) developed by IBM research lab, I think that is my expectation, but looks not open source. So maybe I need develop something similar like storlet with rgw interface. Thanks 2015-11-14 1:53 GMT+08:00 Gregory Farnum <gfar...@redhat.com>: > I think I saw somebody working on a RADOS interface to Apache Hadoop once, > maybe search for that? > Your other option is to try and make use of object classes directly, but > that's a bit orimitive to build full map-reduce on top of without a lot of > effort. > -Greg > > > On Friday, November 13, 2015, min fang <louisfang2...@gmail.com> wrote: > >> Hi, I setup ceph cluster for storing pictures. I want to introduce a data >> mining program in ceph osd nodes to dig objects with concrete properties. >> >> I hope some kind of map-reduce framework can use ceph object interface >> directly,while not using posix file system interface. >> >> Can somebody help give some suggestion? >> >> Thanks. >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] can not create rbd image
Hi cepher, I tried to use the following command to create a img, but unfortunately, the command hung for a long time until I broken it by crtl-z. rbd -p hello create img-003 --size 512 so I checked the cluster status, and showed: cluster 0379cebd-b546-4954-b5d6-e13d08b7d2f1 health HEALTH_WARN 2 near full osd(s) too many PGs per OSD (320 > max 300) monmap e2: 1 mons at {vl=192.168.90.253:6789/0} election epoch 1, quorum 0 vl osdmap e37: 2 osds: 2 up, 2 in pgmap v19544: 320 pgs, 3 pools, 12054 MB data, 3588 objects 657 GB used, 21867 MB / 714 GB avail 320 active+clean I did not see error message here could cause rbd create hang. I opened the client log and see: 2015-11-12 22:52:44.687491 7f89eced9780 20 librbd: create 0x7fff8f7b7800 name = img-003 size = 536870912 old_format = 1 features = 0 order = 22 stripe_unit = 0 stripe_count = 0 2015-11-12 22:52:44.687653 7f89eced9780 1 -- 192.168.90.253:0/1006121 --> 192.168.90.253:6800/5472 -- osd_op(client.34321.0:1 img-003.rbd [stat] 2.8a047315 ack+read+known_if_redirected e37) v5 -- ?+0 0x28513d0 con 0x285 2015-11-12 22:52:44.688928 7f89e066b700 1 -- 192.168.90.253:0/1006121 <== osd.1 192.168.90.253:6800/5472 1 osd_op_reply(1 img-003.rbd [stat] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 178+0+0 (3550830125 0 0) 0x7f89cae0 con 0x285 2015-11-12 22:52:44.689090 7f89eced9780 1 -- 192.168.90.253:0/1006121 --> 192.168.90.253:6801/5344 -- osd_op(client.34321.0:2 rbd_id.img-003 [stat] 2.638c75a8 ack+read+known_if_redirected e37) v5 -- ?+0 0x2858330 con 0x2856f50 2015-11-12 22:52:44.690425 7f89e0469700 1 -- 192.168.90.253:0/1006121 <== osd.0 192.168.90.253:6801/5344 1 osd_op_reply(2 rbd_id.img-003 [stat] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 181+0+0 (1202435393 0 0) 0x7f89b8000ae0 con 0x2856f50 2015-11-12 22:52:44.690494 7f89eced9780 2 librbd: adding rbd image to directory... 2015-11-12 22:52:44.690544 7f89eced9780 1 -- 192.168.90.253:0/1006121 --> 192.168.90.253:6801/5344 -- osd_op(client.34321.0:3 rbd_directory [tmapup 0~0] 2.30a98c1c ondisk+write+known_if_redirected e37) v5 -- ?+0 0x2858920 con 0x2856f50 2015-11-12 22:52:59.687447 7f89e4074700 1 -- 192.168.90.253:0/1006121 --> 192.168.90.253:6789/0 -- mon_subscribe({monmap=3+,osdmap=38}) v2 -- ?+0 0x7f89bab0 con 0x2843b90 2015-11-12 22:52:59.687472 7f89e4074700 1 -- 192.168.90.253:0/1006121 --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bf40 con 0x2856f50 2015-11-12 22:52:59.687887 7f89e3873700 1 -- 192.168.90.253:0/1006121 <== mon.0 192.168.90.253:6789/0 11 mon_subscribe_ack(300s) v1 20+0+0 (2867606018 0 0) 0x7f89d8001160 con 0x2843b90 2015-11-12 22:53:04.687593 7f89e4074700 1 -- 192.168.90.253:0/1006121 --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0 con 0x2856f50 2015-11-12 22:53:09.687731 7f89e4074700 1 -- 192.168.90.253:0/1006121 --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0 con 0x2856f50 2015-11-12 22:53:14.687844 7f89e4074700 1 -- 192.168.90.253:0/1006121 --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0 con 0x2856f50 2015-11-12 22:53:19.687978 7f89e4074700 1 -- 192.168.90.253:0/1006121 --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0 con 0x2856f50 2015-11-12 22:53:24.688116 7f89e4074700 1 -- 192.168.90.253:0/1006121 --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0 con 0x2856f50 2015-11-12 22:53:29.688253 7f89e4074700 1 -- 192.168.90.253:0/1006121 --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0 con 0x2856f50 2015-11-12 22:53:34.688389 7f89e4074700 1 -- 192.168.90.253:0/1006121 --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0 con 0x2856f50 2015-11-12 22:53:39.688512 7f89e4074700 1 -- 192.168.90.253:0/1006121 --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0 con 0x2856f50 2015-11-12 22:53:44.688636 7f89e4074700 1 -- 192.168.90.253:0/1006121 --> 192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0 con 0x2856f50 Looks to me, we are keeping ping magic process, and no completed. my ceph version is "ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)" somebody can help me on this? Or still me to collect more debug information for analyzing. Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Fwd: segmentation fault when using librbd interface
Hi,my code get Segmentation fault when using librbd to do sync read IO. >From the trace, I can say there are several read IOs get successfully, but the last read IO (2015-10-31 08:56:34.804383) can not be returned and my code got segmentation fault. I used rbd_read interface and malloc a buffer for read data buffer. Anybody can help this? Thanks. 2015-10-31 08:56:34.750411 7f04bcbdc7c0 20 librbd: read 0x17896d0 off = 0 len = 4096 2015-10-31 08:56:34.750436 7f04bcbdc7c0 20 librbd: aio_read 0x17896d0 completion 0x1799440 [0,4096] 2015-10-31 08:56:34.750442 7f04bcbdc7c0 20 librbd: ictx_check 0x17896d0 2015-10-31 08:56:34.750451 7f04bcbdc7c0 20 librbd::AsyncOperation: 0x1799570 start_op 2015-10-31 08:56:34.750453 7f04bcbdc7c0 20 librbd: oid rb.0.8597.2ae8944a. 0~4096 from [0,4096] 2015-10-31 08:56:34.750457 7f04bcbdc7c0 10 librbd::ImageCtx: prune_parent_extents image overlap 0, object overlap 0 from image extents [] 2015-10-31 08:56:34.750462 7f04bcbdc7c0 20 librbd::AioRequest: send 0x1799c60 rb.0.8597.2ae8944a. 0~4096 2015-10-31 08:56:34.750498 7f04bcbdc7c0 1 -- 192.168.90.240:0/1006544 --> 192.168.90.253:6801/2041 -- osd_op(client.34253.0:92 rb.0.8597.2ae8944a. [sparse-read 0~4096] 2.7cf90552 ack+read+known_if_redirected e30) v5 -- ?+0 0x179b890 con 0x17877b0 2015-10-31 08:56:34.750526 7f04bcbdc7c0 20 librbd::AioCompletion: AioCompletion::finish_adding_requests 0x1799440 pending 1 2015-10-31 08:56:34.780308 7f04b0bb5700 1 -- 192.168.90.240:0/1006544 <== osd.0 192.168.90.253:6801/2041 5 osd_op_reply(92 rb.0.8597.2ae8944a. [sparse-read 0~4096] v0'0 uv8 ondisk = 0) v6 198+0+4120 (3153096351 0 1287205638) 0x7f0494001ce0 con 0x17877b0 2015-10-31 08:56:34.780408 7f04b14b7700 20 librbd::AioRequest: should_complete 0x1799c60 rb.0.8597.2ae8944a. 0~4096 r = 0 2015-10-31 08:56:34.780418 7f04b14b7700 20 librbd::AioRequest: should_complete 0x1799c60 READ_FLAT 2015-10-31 08:56:34.780420 7f04b14b7700 20 librbd::AioRequest: complete 0x1799c60 2015-10-31 08:56:34.780421 7f04b14b7700 10 librbd::AioCompletion: C_AioRead::finish() 0x1793710 r = 0 2015-10-31 08:56:34.780422 7f04b14b7700 10 librbd::AioCompletion: got {0=4096} for [0,4096] bl 4096 2015-10-31 08:56:34.780432 7f04b14b7700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x1799440 complete_cb=0x7f04ba2b1240 pending 1 2015-10-31 08:56:34.780434 7f04b14b7700 20 librbd::AioCompletion: AioCompletion::finalize() 0x1799440 rval 4096 read_buf 0x179a5e0 read_bl 0 2015-10-31 08:56:34.780440 7f04b14b7700 20 librbd::AioCompletion: AioCompletion::finalize() copied resulting 4096 bytes to 0x179a5e0 2015-10-31 08:56:34.780442 7f04b14b7700 20 librbd::AsyncOperation: 0x1799570 finish_op 2015-10-31 08:56:34.780766 7f04bcbdc7c0 20 librbd: read 0x17896d0 off = 4096 len = 4096 2015-10-31 08:56:34.780778 7f04bcbdc7c0 20 librbd: aio_read 0x17896d0 completion 0x1799440 [4096,4096] 2015-10-31 08:56:34.780781 7f04bcbdc7c0 20 librbd: ictx_check 0x17896d0 2015-10-31 08:56:34.780786 7f04bcbdc7c0 20 librbd::AsyncOperation: 0x1799570 start_op 2015-10-31 08:56:34.780788 7f04bcbdc7c0 20 librbd: oid rb.0.8597.2ae8944a. 4096~4096 from [0,4096] 2015-10-31 08:56:34.780790 7f04bcbdc7c0 10 librbd::ImageCtx: prune_parent_extents image overlap 0, object overlap 0 from image extents [] 2015-10-31 08:56:34.780793 7f04bcbdc7c0 20 librbd::AioRequest: send 0x179bcc0 rb.0.8597.2ae8944a. 4096~4096 2015-10-31 08:56:34.780813 7f04bcbdc7c0 1 -- 192.168.90.240:0/1006544 --> 192.168.90.253:6801/2041 -- osd_op(client.34253.0:93 rb.0.8597.2ae8944a. [sparse-read 4096~4096] 2.7cf90552 ack+read+known_if_redirected e30) v5 -- ?+0 0x179b5f0 con 0x17877b0 2015-10-31 08:56:34.780833 7f04bcbdc7c0 20 librbd::AioCompletion: AioCompletion::finish_adding_requests 0x1799440 pending 1 2015-10-31 08:56:34.800847 7f04b0bb5700 1 -- 192.168.90.240:0/1006544 <== osd.0 192.168.90.253:6801/2041 6 osd_op_reply(93 rb.0.8597.2ae8944a. [sparse-read 4096~4096] v0'0 uv8 ondisk = 0) v6 198+0+4120 (2253638743 0 3057087703) 0x7f0494001ce0 con 0x17877b0 2015-10-31 08:56:34.800947 7f04b14b7700 20 librbd::AioRequest: should_complete 0x179bcc0 rb.0.8597.2ae8944a. 4096~4096 r = 0 2015-10-31 08:56:34.800956 7f04b14b7700 20 librbd::AioRequest: should_complete 0x179bcc0 READ_FLAT 2015-10-31 08:56:34.800957 7f04b14b7700 20 librbd::AioRequest: complete 0x179bcc0 2015-10-31 08:56:34.800958 7f04b14b7700 10 librbd::AioCompletion: C_AioRead::finish() 0x1796c90 r = 0 2015-10-31 08:56:34.800959 7f04b14b7700 10 librbd::AioCompletion: got {4096=4096} for [0,4096] bl 4096 2015-10-31 08:56:34.800963 7f04b14b7700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x1799440 complete_cb=0x7f04ba2b1240 pending 1 2015-10-31 08:56:34.800965 7f04b14b7700 20 librbd::AioCompletion: AioCompletion::finalize() 0x1799440 rval 4096 read_buf 0x179a5e0 read_bl 0 2015-10-31 08:56:34.800969 7f04b14b7700 20
Re: [ceph-users] segmentation fault when using librbd interface
this segmentation fault should happen in rbd_read function, I can see code call this function, and then get segmentation fault, which means rbd_read has not been completed successfully when segmentation fault happened. 2015-11-01 10:34 GMT+08:00 min fang <louisfang2...@gmail.com>: > > > Hi,my code get Segmentation fault when using librbd to do sync read IO. > From the trace, I can say there are several read IOs get successfully, but > the last read IO (2015-10-31 08:56:34.804383) can not be returned and my > code got segmentation fault. I used rbd_read interface and malloc a buffer > for read data buffer. > > Anybody can help this? Thanks. > > > 2015-10-31 08:56:34.750411 7f04bcbdc7c0 20 librbd: read 0x17896d0 off = 0 > len = 4096 > 2015-10-31 08:56:34.750436 7f04bcbdc7c0 20 librbd: aio_read 0x17896d0 > completion 0x1799440 [0,4096] > 2015-10-31 08:56:34.750442 7f04bcbdc7c0 20 librbd: ictx_check 0x17896d0 > 2015-10-31 08:56:34.750451 7f04bcbdc7c0 20 librbd::AsyncOperation: > 0x1799570 start_op > 2015-10-31 08:56:34.750453 7f04bcbdc7c0 20 librbd: oid > rb.0.8597.2ae8944a. 0~4096 from [0,4096] > 2015-10-31 08:56:34.750457 7f04bcbdc7c0 10 librbd::ImageCtx: > prune_parent_extents image overlap 0, object overlap 0 from image extents [] > 2015-10-31 08:56:34.750462 7f04bcbdc7c0 20 librbd::AioRequest: send > 0x1799c60 rb.0.8597.2ae8944a. 0~4096 > 2015-10-31 08:56:34.750498 7f04bcbdc7c0 1 -- 192.168.90.240:0/1006544 > --> 192.168.90.253:6801/2041 -- osd_op(client.34253.0:92 > rb.0.8597.2ae8944a. [sparse-read 0~4096] 2.7cf90552 > ack+read+known_if_redirected e30) v5 -- ?+0 0x179b890 con 0x17877b0 > 2015-10-31 08:56:34.750526 7f04bcbdc7c0 20 librbd::AioCompletion: > AioCompletion::finish_adding_requests 0x1799440 pending 1 > 2015-10-31 08:56:34.780308 7f04b0bb5700 1 -- 192.168.90.240:0/1006544 > <== osd.0 192.168.90.253:6801/2041 5 osd_op_reply(92 > rb.0.8597.2ae8944a. [sparse-read 0~4096] v0'0 uv8 ondisk = 0) > v6 198+0+4120 (3153096351 0 1287205638) 0x7f0494001ce0 con 0x17877b0 > 2015-10-31 08:56:34.780408 7f04b14b7700 20 librbd::AioRequest: > should_complete 0x1799c60 rb.0.8597.2ae8944a. 0~4096 r = 0 > 2015-10-31 08:56:34.780418 7f04b14b7700 20 librbd::AioRequest: > should_complete 0x1799c60 READ_FLAT > 2015-10-31 08:56:34.780420 7f04b14b7700 20 librbd::AioRequest: complete > 0x1799c60 > 2015-10-31 08:56:34.780421 7f04b14b7700 10 librbd::AioCompletion: > C_AioRead::finish() 0x1793710 r = 0 > 2015-10-31 08:56:34.780422 7f04b14b7700 10 librbd::AioCompletion: got > {0=4096} for [0,4096] bl 4096 > 2015-10-31 08:56:34.780432 7f04b14b7700 20 librbd::AioCompletion: > AioCompletion::complete_request() 0x1799440 complete_cb=0x7f04ba2b1240 > pending 1 > 2015-10-31 08:56:34.780434 7f04b14b7700 20 librbd::AioCompletion: > AioCompletion::finalize() 0x1799440 rval 4096 read_buf 0x179a5e0 read_bl 0 > 2015-10-31 08:56:34.780440 7f04b14b7700 20 librbd::AioCompletion: > AioCompletion::finalize() copied resulting 4096 bytes to 0x179a5e0 > 2015-10-31 08:56:34.780442 7f04b14b7700 20 librbd::AsyncOperation: > 0x1799570 finish_op > 2015-10-31 08:56:34.780766 7f04bcbdc7c0 20 librbd: read 0x17896d0 off = > 4096 len = 4096 > 2015-10-31 08:56:34.780778 7f04bcbdc7c0 20 librbd: aio_read 0x17896d0 > completion 0x1799440 [4096,4096] > 2015-10-31 08:56:34.780781 7f04bcbdc7c0 20 librbd: ictx_check 0x17896d0 > 2015-10-31 08:56:34.780786 7f04bcbdc7c0 20 librbd::AsyncOperation: > 0x1799570 start_op > 2015-10-31 08:56:34.780788 7f04bcbdc7c0 20 librbd: oid > rb.0.8597.2ae8944a. 4096~4096 from [0,4096] > 2015-10-31 08:56:34.780790 7f04bcbdc7c0 10 librbd::ImageCtx: > prune_parent_extents image overlap 0, object overlap 0 from image extents [] > 2015-10-31 08:56:34.780793 7f04bcbdc7c0 20 librbd::AioRequest: send > 0x179bcc0 rb.0.8597.2ae8944a. 4096~4096 > 2015-10-31 08:56:34.780813 7f04bcbdc7c0 1 -- 192.168.90.240:0/1006544 > --> 192.168.90.253:6801/2041 -- osd_op(client.34253.0:93 > rb.0.8597.2ae8944a. [sparse-read 4096~4096] 2.7cf90552 > ack+read+known_if_redirected e30) v5 -- ?+0 0x179b5f0 con 0x17877b0 > 2015-10-31 08:56:34.780833 7f04bcbdc7c0 20 librbd::AioCompletion: > AioCompletion::finish_adding_requests 0x1799440 pending 1 > 2015-10-31 08:56:34.800847 7f04b0bb5700 1 -- 192.168.90.240:0/1006544 > <== osd.0 192.168.90.253:6801/2041 6 osd_op_reply(93 > rb.0.8597.2ae8944a. [sparse-read 4096~4096] v0'0 uv8 ondisk = > 0) v6 198+0+4120 (2253638743 0 3057087703) 0x7f0494001ce0 con 0x17877b0 > 2015-10-31 08:56:34.800947 7f04b14b7700 20 librbd::AioRequest: > should_complete 0x179bcc0 rb.0.8597.2ae8944a. 4096~4096 r = 0 > 2015-10-31 08:56:34.800956 7f04
Re: [ceph-users] How ceph client abort IO
I want to abort and retry a IO if taking longer time not completed. Does this make sense in Ceph? How ceph client handle longer timeout IOs? Just wait until it returned, or other error recovery method can be used to handle IO which can not be responsed in time. Thanks. 2015-10-20 21:00 GMT+08:00 Jason Dillaman <dilla...@redhat.com>: > There is no such interface currently on the librados / OSD side to abort > IO operations. Can you provide some background on your use-case for > aborting in-flight IOs? > > -- > > Jason Dillaman > > > - Original Message - > > > From: "min fang" <louisfang2...@gmail.com> > > To: ceph-users@lists.ceph.com > > Sent: Monday, October 19, 2015 6:41:40 PM > > Subject: [ceph-users] How ceph client abort IO > > > Can librbd interface provide abort api for aborting IO? If yes, can the > abort > > interface detach write buffer immediately? I hope can reuse the write > buffer > > quickly after issued the abort request, while not waiting IO aborted in > osd > > side. > > > thanks. > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How ceph client abort IO
Can librbd interface provide abort api for aborting IO? If yes, can the abort interface detach write buffer immediately? I hope can reuse the write buffer quickly after issued the abort request, while not waiting IO aborted in osd side. thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com