Re: [ceph-users] ceph write performance issue

2016-09-29 Thread min fang
I used 2 copies, not 3, so should be 1000MB/s in theory. thanks.

2016-09-29 17:54 GMT+08:00 Nick Fisk <n...@fisk.me.uk>:

> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *min fang
> *Sent:* 29 September 2016 10:34
> *To:* ceph-users <ceph-users@lists.ceph.com>
> *Subject:* [ceph-users] ceph write performance issue
>
>
>
> Hi, I created 40 osds ceph cluster with 8 PM863 960G SSD as journal. One
> ssd is used by 5 osd drives as journal.   The ssd 512 random write
> performance is about 450MB/s, but the whole cluster sequential write
> throughput is only 800MB/s. Any suggestion on improving sequential write
> performance? thanks.
>
>
>
> Take a conservative figure of 50MB/s for each disk as writing in Ceph is
> not just straight sequential writes, there is a slight random nature to it.
>
> (40x50MB/s)/3 = 666MB/s. Seems fine to me.
>
>
>
>
> Testing result is here:
> rados bench -p libvirt-pool 10 write --no-cleanup
> Maintaining 16 concurrent writes of 4194304 bytes to objects of size
> 4194304 for up to 10 seconds or 0 objects
> Object prefix: benchmark_data_redpower-sh-04_16462
>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
> lat(s)
> 0   0 0 0 0 0
> -   0
> 1  15   189   174   695.968   696   0.0359122
> 0.082477
> 2  16   395   379   757.938   820   0.0634079
> 0.0826266
> 3  16   582   566   754.601   748   0.0401129
> 0.0830207
> 4  16   796   780   779.934   856   0.0374938
> 0.0816794
> 5  16   977   961   768.735   724   0.0489886
> 0.0827479
> 6  16  1172  1156   770.601   780   0.0428639
> 0.0812062
> 7  16  1387  1371   783.362   860   0.0461826
> 0.0811803
> 8  16  1545  1529   764.433   6320.238497
> 0.0831018
> 9  16  1765  1749   777.265   880   0.0557358
> 0.0814399
>10  16  1971  1955   781.931   824   0.0321333
> 0.0814144
> Total time run: 10.044813
> Total writes made:  1972
> Write size: 4194304
> Object size:4194304
> Bandwidth (MB/sec): 785.281
> Stddev Bandwidth:   80.8235
> Max bandwidth (MB/sec): 880
> Min bandwidth (MB/sec): 632
> Average IOPS:   196
> Stddev IOPS:20
> Max IOPS:   220
> Min IOPS:   158
> Average Latency(s): 0.081415
> Stddev Latency(s):  0.0554568
> Max latency(s): 0.345111
> Min latency(s): 0.0230153
>
> my ceph osd configuration:
> sd_mkfs_type = xfs
> osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k
> osd_mkfs_options_xfs = -f -i size=2048
> filestore_max_inline_xattr_size = 254
> filestore_max_inline_xattrs = 6
> osd_op_threads = 20
> filestore_queue_max_ops = 25000
> journal_max_write_entries=1
> journal_queue_max_ops=5
> objecter_inflight_ops=10240
> filestore_queue_max_bytes=1048576000
> filestore_queue_committing_max_bytes =1048576000
> journal_max_write_bytes=1073714824
> journal_queue_max_bytes=1048576
> ms_dispatch_throttle_bytes=1048576000
> objecter_infilght_op_bytes=1048576000
> filestore_max_sync_interval=20
> filestore_flusher=false
> filestore_flush_min=0
> filestore_sync_flush=true
> journal_block_align = true
> journal_dio = true
> journal_aio = true
> journal_force_aio = true
> osd_op_num_shards=8
> osd_op_num_threads_per_shard=2
> filestore_wbthrottle_enable=false
> filestore_fd_cache_size=1024
> filestore_omap_header_cache_size=1024
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph write performance issue

2016-09-29 Thread min fang
Hi, I created 40 osds ceph cluster with 8 PM863 960G SSD as journal. One
ssd is used by 5 osd drives as journal.   The ssd 512 random write
performance is about 450MB/s, but the whole cluster sequential write
throughput is only 800MB/s. Any suggestion on improving sequential write
performance? thanks.

Testing result is here:
rados bench -p libvirt-pool 10 write --no-cleanup
Maintaining 16 concurrent writes of 4194304 bytes to objects of size
4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_redpower-sh-04_16462
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
lat(s)
0   0 0 0 0 0   -
0
1  15   189   174   695.968   696   0.0359122
0.082477
2  16   395   379   757.938   820   0.0634079
0.0826266
3  16   582   566   754.601   748   0.0401129
0.0830207
4  16   796   780   779.934   856   0.0374938
0.0816794
5  16   977   961   768.735   724   0.0489886
0.0827479
6  16  1172  1156   770.601   780   0.0428639
0.0812062
7  16  1387  1371   783.362   860   0.0461826
0.0811803
8  16  1545  1529   764.433   6320.238497
0.0831018
9  16  1765  1749   777.265   880   0.0557358
0.0814399
   10  16  1971  1955   781.931   824   0.0321333
0.0814144
Total time run: 10.044813
Total writes made:  1972
Write size: 4194304
Object size:4194304
Bandwidth (MB/sec): 785.281
Stddev Bandwidth:   80.8235
Max bandwidth (MB/sec): 880
Min bandwidth (MB/sec): 632
Average IOPS:   196
Stddev IOPS:20
Max IOPS:   220
Min IOPS:   158
Average Latency(s): 0.081415
Stddev Latency(s):  0.0554568
Max latency(s): 0.345111
Min latency(s): 0.0230153

my ceph osd configuration:
sd_mkfs_type = xfs
osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k
osd_mkfs_options_xfs = -f -i size=2048
filestore_max_inline_xattr_size = 254
filestore_max_inline_xattrs = 6
osd_op_threads = 20
filestore_queue_max_ops = 25000
journal_max_write_entries=1
journal_queue_max_ops=5
objecter_inflight_ops=10240
filestore_queue_max_bytes=1048576000
filestore_queue_committing_max_bytes =1048576000
journal_max_write_bytes=1073714824
journal_queue_max_bytes=1048576
ms_dispatch_throttle_bytes=1048576000
objecter_infilght_op_bytes=1048576000
filestore_max_sync_interval=20
filestore_flusher=false
filestore_flush_min=0
filestore_sync_flush=true
journal_block_align = true
journal_dio = true
journal_aio = true
journal_force_aio = true
osd_op_num_shards=8
osd_op_num_threads_per_shard=2
filestore_wbthrottle_enable=false
filestore_fd_cache_size=1024
filestore_omap_header_cache_size=1024
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph pg level IO sequence

2016-06-23 Thread min fang
Hi, as my understanding, in PG level, IOs are execute in a sequential way,
such as the following cases:

Case 1:
Write A, Write B, Write C to the same data area in a PG --> A Committed,
then B committed, then C.  The final data will from write C. Impossible
that mixed (A, B,C) data is in the data area.

Case 2:
Write A, Write B, Read C to the same data area in a PG-> Read C will return
the data from Write B, not Write A.

Are the above cases true?

thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] stuck unclean since forever

2016-06-22 Thread min fang
Thanks, actually I create a pool with more pgs also meet this problem.
Following is my crush map, please help point how to change the crush
ruleset? thanks.

#begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable straw_calc_version 1

# devices
device 0 device0
device 1 device1
device 2 osd.2
device 3 osd.3
device 4 osd.4

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host redpower-ceph-01 {
id -2   # do not change unnecessarily
# weight 3.000
alg straw
hash 0  # rjenkins1
item osd.2 weight 1.000
item osd.3 weight 1.000
item osd.4 weight 1.000
}
root default {
id -1   # do not change unnecessarily
# weight 3.000
alg straw
hash 0  # rjenkins1
item redpower-ceph-01 weight 3.000
}

# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

# end crush map


2016-06-22 18:27 GMT+08:00 Burkhard Linke <
burkhard.li...@computational.bio.uni-giessen.de>:

> Hi,
>
> On 06/22/2016 12:10 PM, min fang wrote:
>
> Hi, I created a new ceph cluster, and create a pool, but see "stuck
> unclean since forever" errors happen(as the following), can help point out
> the possible reasons for this? thanks.
>
> ceph -s
> cluster 602176c1-4937-45fc-a246-cc16f1066f65
>  health HEALTH_WARN
> 8 pgs degraded
> 8 pgs stuck unclean
> 8 pgs undersized
> too few PGs per OSD (2 < min 30)
>  monmap e1: 1 mons at {ceph-01=172.0.0.11:6789/0}
> election epoch 14, quorum 0 ceph-01
>  osdmap e89: 3 osds: 3 up, 3 in
> flags
>   pgmap v310: 8 pgs, 1 pools, 0 bytes data, 0 objects
> 60112 MB used, 5527 GB / 5586 GB avail
>8 active+undersized+degraded
>
>
> *snipsnap*
>
> With three OSDs and a single host you need to change the crush ruleset for
> the pool, since it tries to distribute the data across 3 different _host_
> by default.
>
> Regards,
> Burkhard
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] stuck unclean since forever

2016-06-22 Thread min fang
Hi, I created a new ceph cluster, and create a pool, but see "stuck unclean
since forever" errors happen(as the following), can help point out the
possible reasons for this? thanks.

ceph -s
cluster 602176c1-4937-45fc-a246-cc16f1066f65
 health HEALTH_WARN
8 pgs degraded
8 pgs stuck unclean
8 pgs undersized
too few PGs per OSD (2 < min 30)
 monmap e1: 1 mons at {ceph-01=172.0.0.11:6789/0}
election epoch 14, quorum 0 ceph-01
 osdmap e89: 3 osds: 3 up, 3 in
flags
  pgmap v310: 8 pgs, 1 pools, 0 bytes data, 0 objects
60112 MB used, 5527 GB / 5586 GB avail
   8 active+undersized+degraded

ceph health detail
HEALTH_WARN 8 pgs degraded; 8 pgs stuck unclean; 8 pgs undersized; too few
PGs per OSD (2 < min 30)
pg 5.0 is stuck unclean since forever, current state
active+undersized+degraded, last acting [3]
pg 5.1 is stuck unclean since forever, current state
active+undersized+degraded, last acting [3]
pg 5.2 is stuck unclean since forever, current state
active+undersized+degraded, last acting [3]
pg 5.3 is stuck unclean since forever, current state
active+undersized+degraded, last acting [4]
pg 5.7 is stuck unclean since forever, current state
active+undersized+degraded, last acting [3]
pg 5.6 is stuck unclean since forever, current state
active+undersized+degraded, last acting [2]
pg 5.5 is stuck unclean since forever, current state
active+undersized+degraded, last acting [4]
pg 5.4 is stuck unclean since forever, current state
active+undersized+degraded, last acting [4]
pg 5.7 is active+undersized+degraded, acting [3]
pg 5.6 is active+undersized+degraded, acting [2]
pg 5.5 is active+undersized+degraded, acting [4]
pg 5.4 is active+undersized+degraded, acting [4]
pg 5.3 is active+undersized+degraded, acting [4]
pg 5.2 is active+undersized+degraded, acting [3]
pg 5.1 is active+undersized+degraded, acting [3]
pg 5.0 is active+undersized+degraded, acting [3]
too few PGs per OSD (2 < min 30)

ceph osd tree
ID WEIGHT  TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 3.0 root default
-2 3.0 host ceph-01
 2 1.0 osd.2  up  1.0  1.0
 3 1.0 osd.3  up  1.0  1.0
 4 1.0 osd.4  up  1.0  1.0

 ceph osd crush tree
[
{
"id": -1,
"name": "default",
"type": "root",
"type_id": 10,
"items": [
{
"id": -2,
"name": "ceph-01",
"type": "host",
"type_id": 1,
"items": [
{
"id": 2,
"name": "osd.2",
"type": "osd",
"type_id": 0,
"crush_weight": 1.00,
"depth": 2
},
{
"id": 3,
"name": "osd.3",
"type": "osd",
"type_id": 0,
"crush_weight": 1.00,
"depth": 2
},
{
"id": 4,
"name": "osd.4",
"type": "osd",
"type_id": 0,
"crush_weight": 1.00,
"depth": 2
}
]
}
]
}
]
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] librbd compatibility

2016-06-20 Thread min fang
Hi, is there a document describing librbd compatibility?  For example,
something like this: librbd from Ceph 0.88 can also be applied to
0.90,0.91..

I hope not keep librbd relative stable, so can avoid more code iteration
and testing.

Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] performance drop a lot when running fio mix read/write

2016-05-02 Thread min fang
Hi,  I run randow fio with rwmixread=70, and found read iops is 707, write
is 303. (reference the following).  This value is less than random write
and read value. The 4K random write IOPs is 529 and 4k randread IOPs is
11343.  Apart from rw type is different, other parameters are all same.

I do not understand why mix write and read will have so huge impact on
performance. All random IOs. thanks.


fio -filename=/dev/rbd2 -direct=1 -iodepth 64 -thread -rw=randrw
-rwmixread=70 -ioengine=libaio -bs=4k -size=100G -numjobs=1 -runtime=1000
-group_reporting -name=mytest1
mytest1: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.8
time 7423  cycles_start=1062103697308843
Starting 1 thread
Jobs: 1 (f=1): [m(1)] [100.0% done] [2144KB/760KB/0KB /s] [536/190/0 iops]
[eta 00m:00s]
mytest1: (groupid=0, jobs=1): err= 0: pid=7425: Sat Apr 30 08:55:14 2016
  read : io=2765.2MB, bw=2830.5KB/s, iops=707, runt=1000393msec
slat (usec): min=2, max=268, avg= 8.93, stdev= 4.17
clat (usec): min=203, max=1939.9K, avg=34039.43, stdev=93674.48
 lat (usec): min=207, max=1939.9K, avg=34048.93, stdev=93674.50
clat percentiles (usec):
 |  1.00th=[  516],  5.00th=[  836], 10.00th=[ 1112], 20.00th=[ 1448],
 | 30.00th=[ 1736], 40.00th=[ 6944], 50.00th=[13376], 60.00th=[17280],
 | 70.00th=[21888], 80.00th=[30848], 90.00th=[49920], 95.00th=[103936],
 | 99.00th=[552960], 99.50th=[675840], 99.90th=[880640],
99.95th=[954368],
 | 99.99th=[1105920]
bw (KB  /s): min=  350, max= 5944, per=100.00%, avg=2837.77,
stdev=1272.84
  write: io=1184.8MB, bw=1212.8KB/s, iops=303, runt=1000393msec
slat (usec): min=2, max=310, avg= 9.35, stdev= 4.50
clat (msec): min=5, max=2210, avg=131.60, stdev=226.47
 lat (msec): min=5, max=2210, avg=131.61, stdev=226.47
clat percentiles (msec):
 |  1.00th=[9],  5.00th=[   13], 10.00th=[   15], 20.00th=[   20],
 | 30.00th=[   25], 40.00th=[   34], 50.00th=[   44], 60.00th=[   61],
 | 70.00th=[   84], 80.00th=[  125], 90.00th=[  449], 95.00th=[  709],
 | 99.00th=[ 1037], 99.50th=[ 1139], 99.90th=[ 1369], 99.95th=[ 1450],
 | 99.99th=[ 1663]
bw (KB  /s): min=   40, max= 2562, per=100.00%, avg=1215.62,
stdev=564.19
lat (usec) : 250=0.01%, 500=0.60%, 750=1.94%, 1000=2.95%
lat (msec) : 2=18.69%, 4=2.46%, 10=4.21%, 20=22.05%, 50=26.40%
lat (msec) : 100=9.65%, 250=4.64%, 500=2.76%, 750=2.13%, 1000=1.11%
lat (msec) : 2000=0.39%, >=2000=0.01%
  cpu  : usr=0.83%, sys=1.47%, ctx=971080, majf=0, minf=1
  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
>=64=100.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%,
>=64=0.0%
 issued: total=r=707885/w=303294/d=0, short=r=0/w=0/d=0,
drop=r=0/w=0/d=0
 latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: io=2765.2MB, aggrb=2830KB/s, minb=2830KB/s, maxb=2830KB/s,
mint=1000393msec, maxt=1000393msec
  WRITE: io=1184.8MB, aggrb=1212KB/s, minb=1212KB/s, maxb=1212KB/s,
mint=1000393msec, maxt=1000393msec

Disk stats (read/write):
  rbd2: ios=707885/303293, merge=0/0, ticks=24085792/39904840,
in_queue=64045864, util=100.00%
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cache tier

2016-04-21 Thread min fang
thanks Oliver, does the journal need be committed twice? One is for write
IO to the cache tier? the other is for write IO destaged to SATA backend
pool?

2016-04-21 19:38 GMT+08:00 Oliver Dzombic <i...@ip-interactive.de>:

> Hi,
>
> afaik cache does not have to do anything with journals.
>
> So your OSD's need journals, and for performance, you will take SSD's.
>
> The Cache should be something faster than your OSD's. Usually SSD or NVMe.
>
> The Cache is an extra Space in front of your OSD's which is supposed to
> speed up things because the cache operate faster than the OSD's and the
> cache will flush its content to the slower OSD's.
>
> So, the cache is independent from the OSD's. And this way, from their
> journals.
>
> And, the cache >must< be on faster drives than the OSD's, otherwise you
> wont see any performance increase.
>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:i...@ip-interactive.de
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
>
>
> Am 21.04.2016 um 13:27 schrieb min fang:
> > Hi, my ceph cluster has two pools, ssd cache tier pool and SATA backend
> > pool. For this configuration, do I need use SSD as journal device? I do
> > not know whether cache tier take the journal role? thanks
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cache tier

2016-04-21 Thread min fang
Hi, my ceph cluster has two pools, ssd cache tier pool and SATA backend
pool. For this configuration, do I need use SSD as journal device? I do not
know whether cache tier take the journal role? thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph rbd object write is atomic?

2016-04-06 Thread min fang
Thanks Jason, yes, I also do not think they can guarantee atomic in extent
level. But for a stripe unit in a object, can the atomic write be
guaranteed? thanks.

2016-04-06 19:53 GMT+08:00 Jason Dillaman <dilla...@redhat.com>:

> It's possible for a write to span one or more blocks -- it just depends on
> the write address/size and the RBD image layout (object size, "fancy"
> striping, etc).  Regardless, however, RBD cannot provide any ordering
> guarantees when two clients are writing to the same image at the same
> extent.  To safely use two or more clients concurrently on the same image
> you need a clustering filesystem on top of RBD (e.g. GFS2) or the
> application needs to provide its own coordination to avoid concurrent
> writes to the same image extents.
>
> --
>
> Jason Dillaman
>
>
> - Original Message -
>
> > From: "min fang" <louisfang2...@gmail.com>
> > To: "ceph-users" <ceph-users@lists.ceph.com>
> > Sent: Tuesday, April 5, 2016 10:11:10 PM
> > Subject: [ceph-users] ceph rbd object write is atomic?
>
> > Hi, as my understanding, ceph rbd image will be divided into multiple
> objects
> > based on LBA address.
>
> > My question here is:
>
> > if two clients write to the same LBA address, such as client A write
> ""
> > to LBA 0x123456, client B write "" to the same LBA.
>
> > LBA address and data will only be in an object, not cross two objects.
>
> > Will ceph guarantee object data must be "" or ""? "aabb", "bbaa"
> will
> > not happen even in a stripe data layout model?
>
> > thanks.
>
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph rbd object write is atomic?

2016-04-05 Thread min fang
Hi, as my understanding, ceph rbd image will be divided into multiple
objects based on LBA address.

My question here is:

if two clients write to the same LBA address, such as client A write ""
to LBA 0x123456, client B write "" to the same LBA.

LBA address and data  will only be in an object, not cross two objects.

 Will ceph guarantee object data must be "" or ""? "aabb", "bbaa"
will not happen even in a stripe data layout model?

thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osd up_from, up_thru

2016-03-06 Thread min fang
Dear,  I used osd dump to extract osd monmap, and found up_from, up_thru
list, what is the difference between up_from and up_thru?

osd.0 up   in  weight 1 up_from 673 up_thru 673 down_at 670
last_clean_interval [637,669)

Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd cache did not help improve performance

2016-03-01 Thread min fang
thanks, with your help, I set the read ahead parameter. What is the cache
parameter for kernel module rbd?
Such as:
1) what is the cache size?
2) Does it support write back?
3) Will read ahead be disabled if max bytes has been read into cache?
(similar the concept as "rbd_readahead_disable_after_bytes".

thanks again.

2016-03-01 21:31 GMT+08:00 Adrien Gillard <gillard.adr...@gmail.com>:

> As Tom stated, RBD cache only works if your client is using librbd (KVM
> clients for instance).
> Using the kernel RBD client, one of the parameter you can tune to optimize
> sequential read is increasing /sys/class/block/rbd4/queue/read_ahead_kb
>
> Adrien
>
>
>
> On Tue, Mar 1, 2016 at 12:48 PM, min fang <louisfang2...@gmail.com> wrote:
>
>> I can use the following command to change parameter, for example as the
>> following,  but not sure whether it will work.
>>
>>  ceph --admin-daemon /var/run/ceph/ceph-mon.openpower-0.asok config set
>> rbd_readahead_disable_after_bytes 0
>>
>> 2016-03-01 15:07 GMT+08:00 Tom Christensen <pav...@gmail.com>:
>>
>>> If you are mapping the RBD with the kernel driver then you're not using
>>> librbd so these settings will have no effect I believe.  The kernel driver
>>> does its own caching but I don't believe there are any settings to change
>>> its default behavior.
>>>
>>>
>>> On Mon, Feb 29, 2016 at 9:36 PM, Shinobu Kinjo <ski...@redhat.com>
>>> wrote:
>>>
>>>> You may want to set "ioengine=rbd", I guess.
>>>>
>>>> Cheers,
>>>>
>>>> - Original Message -
>>>> From: "min fang" <louisfang2...@gmail.com>
>>>> To: "ceph-users" <ceph-users@lists.ceph.com>
>>>> Sent: Tuesday, March 1, 2016 1:28:54 PM
>>>> Subject: [ceph-users]  rbd cache did not help improve performance
>>>>
>>>> Hi, I set the following parameters in ceph.conf
>>>>
>>>> [client]
>>>> rbd cache=true
>>>> rbd cache size= 25769803776
>>>> rbd readahead disable after byte=0
>>>>
>>>>
>>>> map a rbd image to a rbd device then run fio testing on 4k read as the
>>>> command
>>>> ./fio -filename=/dev/rbd4 -direct=1 -iodepth 64 -thread -rw=read
>>>> -ioengine=aio -bs=4K -size=500G -numjobs=32 -runtime=300 -group_reporting
>>>> -name=mytest2
>>>>
>>>> Compared the result with setting rbd cache=false and enable cache
>>>> model, I did not see performance improved by librbd cache.
>>>>
>>>> Is my setting not right, or it is true that ceph librbd cache will not
>>>> have benefit on 4k seq read?
>>>>
>>>> thanks.
>>>>
>>>>
>>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd cache did not help improve performance

2016-03-01 Thread min fang
I can use the following command to change parameter, for example as the
following,  but not sure whether it will work.

 ceph --admin-daemon /var/run/ceph/ceph-mon.openpower-0.asok config set
rbd_readahead_disable_after_bytes 0

2016-03-01 15:07 GMT+08:00 Tom Christensen <pav...@gmail.com>:

> If you are mapping the RBD with the kernel driver then you're not using
> librbd so these settings will have no effect I believe.  The kernel driver
> does its own caching but I don't believe there are any settings to change
> its default behavior.
>
>
> On Mon, Feb 29, 2016 at 9:36 PM, Shinobu Kinjo <ski...@redhat.com> wrote:
>
>> You may want to set "ioengine=rbd", I guess.
>>
>> Cheers,
>>
>> - Original Message -
>> From: "min fang" <louisfang2...@gmail.com>
>> To: "ceph-users" <ceph-users@lists.ceph.com>
>> Sent: Tuesday, March 1, 2016 1:28:54 PM
>> Subject: [ceph-users]  rbd cache did not help improve performance
>>
>> Hi, I set the following parameters in ceph.conf
>>
>> [client]
>> rbd cache=true
>> rbd cache size= 25769803776
>> rbd readahead disable after byte=0
>>
>>
>> map a rbd image to a rbd device then run fio testing on 4k read as the
>> command
>> ./fio -filename=/dev/rbd4 -direct=1 -iodepth 64 -thread -rw=read
>> -ioengine=aio -bs=4K -size=500G -numjobs=32 -runtime=300 -group_reporting
>> -name=mytest2
>>
>> Compared the result with setting rbd cache=false and enable cache model,
>> I did not see performance improved by librbd cache.
>>
>> Is my setting not right, or it is true that ceph librbd cache will not
>> have benefit on 4k seq read?
>>
>> thanks.
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd cache did not help improve performance

2016-02-29 Thread min fang
Hi, I set the following parameters in ceph.conf

[client]
rbd cache=true
rbd cache size= 25769803776
rbd readahead disable after byte=0


map a rbd image to a rbd device then run fio testing on 4k read as the
command
./fio -filename=/dev/rbd4 -direct=1 -iodepth 64 -thread -rw=read
-ioengine=aio -bs=4K -size=500G -numjobs=32 -runtime=300 -group_reporting
-name=mytest2

Compared the result with setting rbd cache=false and enable cache model, I
did not see performance improved by librbd cache.

Is my setting not right, or it is true that ceph librbd cache will not have
benefit on 4k seq read?

thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph random read performance is better than sequential read?

2016-02-02 Thread min fang
Hi, I did a fio testing on my ceph cluster, and found ceph random read
performance is better than sequential read. Is it true in your stand?

Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] can rbd block_name_prefix be changed?

2016-01-08 Thread min fang
Hi, can rbd block_name_prefix be changed?  Is it constant for a rbd image?

thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Read IO to object while new data still in journal

2015-12-30 Thread min fang
thanks, so ceph can guarantee after write commit call back, read IO can get
the new written data, right?

2015-12-31 10:55 GMT+08:00 Zhi Zhang <zhang.david2...@gmail.com>:

> If the data has not been written to filestore, as you mentioned, it is
> still in journal, your following read op will be blocked until the
> data is written to filestore.
>
> This is because when writing this data, the related object context
> will hold ondisk_write_lock. This lock will be released in a callback
> after data is in filestore. When ondisk_write_lock is held, read op to
> this data will be blocked.
>
>
> Regards,
> Zhi Zhang (David)
> Contact: zhang.david2...@gmail.com
>   zhangz.da...@outlook.com
>
>
> On Thu, Dec 31, 2015 at 10:33 AM, min fang <louisfang2...@gmail.com>
> wrote:
> > yes, the question here is, librbd use the committed callback, as my
> > understanding, when this callback returned, librbd write will be looked
> as
> > completed. So I can issue a read IO even if the data is not readable. In
> > this case, i would like to know what data will be returned for the read
> IO?
> >
> > 2015-12-31 10:29 GMT+08:00 Dong Wu <archer.wud...@gmail.com>:
> >>
> >> there are two callbacks: committed and applied, committed means write
> >> to all replica's journal, applied means write to all replica's file
> >> system. so when applied callback return to client, it means data can
> >> be read.
> >>
> >> 2015-12-31 10:15 GMT+08:00 min fang <louisfang2...@gmail.com>:
> >> > Hi, as my understanding, write IO will committed data to journal
> >> > firstly,
> >> > then give a safe callback to ceph client. So it is possible that data
> >> > still
> >> > in journal when I send a read IO to the same area. So what data will
> be
> >> > returned if the new data still in journal?
> >> >
> >> > Thanks.
> >> >
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Read IO to object while new data still in journal

2015-12-30 Thread min fang
yes, the question here is, librbd use the committed callback, as my
understanding, when this callback returned, librbd write will be looked as
completed. So I can issue a read IO even if the data is not readable. In
this case, i would like to know what data will be returned for the read IO?

2015-12-31 10:29 GMT+08:00 Dong Wu <archer.wud...@gmail.com>:

> there are two callbacks: committed and applied, committed means write
> to all replica's journal, applied means write to all replica's file
> system. so when applied callback return to client, it means data can
> be read.
>
> 2015-12-31 10:15 GMT+08:00 min fang <louisfang2...@gmail.com>:
> > Hi, as my understanding, write IO will committed data to journal firstly,
> > then give a safe callback to ceph client. So it is possible that data
> still
> > in journal when I send a read IO to the same area. So what data will be
> > returned if the new data still in journal?
> >
> > Thanks.
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Read IO to object while new data still in journal

2015-12-30 Thread min fang
Hi, as my understanding, write IO will committed data to journal firstly,
then give a safe callback to ceph client. So it is possible that data still
in journal when I send a read IO to the same area. So what data will be
returned if the new data still in journal?

Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ubuntu 14.04 or centos 7

2015-12-28 Thread min fang
Hi, I am looking for OS for my ceph cluster, from
http://docs.ceph.com/docs/master/start/os-recommendations/#infernalis-9-1-0,
there are two OS has been fully tested, centos 7 and ubuntu 14.04.  Which
one is better? thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Configure Ceph client network

2015-12-24 Thread min fang
Hi, I have a 2 port 10Gb NIC installed in ceph client, but I just want to
use one NIC port to do ceph IO. The other port in the NIC will be reserved
to other purpose.

Does currently ceph support to choose NIC port to do IO?

Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to configure ceph client network

2015-12-20 Thread min fang
Hi, I have a 2 port 10Gb NIC installed in ceph client, but I just want to
use open NIC port to do ceph IO. How can I achieve this?

Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rados_aio_cancel

2015-11-15 Thread min fang
Is this function used in detach rx buffer, and complete IO back to the
caller?  From the code, I think this function will not interact with OSD or
MON side, which means, we just cancel IO from client side. Am I right?

Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph object mining

2015-11-13 Thread min fang
Hi, I setup ceph cluster for storing pictures. I want to introduce a data
mining program in ceph osd nodes to dig objects with concrete properties.

I hope some kind of map-reduce framework can use ceph object interface
directly,while not using posix file system interface.

Can somebody help give some suggestion?

Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph object mining

2015-11-13 Thread min fang
Thanks Gregory. I find intel introduced a technology for running hadoop on
rgw direclty(http://www.slideshare.net/zhouyuan/hadoop-over-rgw), but I
think it is complex for my usage.  There is another technology storlet(
http://ibmresearchnews.blogspot.co.uk/2014/05/storlets-turning-object-storage-into.html)
developed by IBM research lab, I think that is my expectation, but looks
not open source.  So maybe I need develop something similar like storlet
with rgw interface.  Thanks


2015-11-14 1:53 GMT+08:00 Gregory Farnum <gfar...@redhat.com>:

> I think I saw somebody working on a RADOS interface to Apache Hadoop once,
> maybe search for that?
> Your other option is to try and make use of object classes directly, but
> that's a bit orimitive to build full map-reduce on top of without a lot of
> effort.
> -Greg
>
>
> On Friday, November 13, 2015, min fang <louisfang2...@gmail.com> wrote:
>
>> Hi, I setup ceph cluster for storing pictures. I want to introduce a data
>> mining program in ceph osd nodes to dig objects with concrete properties.
>>
>> I hope some kind of map-reduce framework can use ceph object interface
>> directly,while not using posix file system interface.
>>
>> Can somebody help give some suggestion?
>>
>> Thanks.
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] can not create rbd image

2015-11-12 Thread min fang
Hi cepher, I tried to use the following command to create a img, but
unfortunately, the command hung for a long time until I broken it by
crtl-z.

rbd -p hello create img-003 --size 512

so I checked the cluster status, and showed:

cluster 0379cebd-b546-4954-b5d6-e13d08b7d2f1
 health HEALTH_WARN
2 near full osd(s)
too many PGs per OSD (320 > max 300)
 monmap e2: 1 mons at {vl=192.168.90.253:6789/0}
election epoch 1, quorum 0 vl
 osdmap e37: 2 osds: 2 up, 2 in
  pgmap v19544: 320 pgs, 3 pools, 12054 MB data, 3588 objects
657 GB used, 21867 MB / 714 GB avail
 320 active+clean

I did not see error message here could cause rbd create hang.

I opened the client log and see:

2015-11-12 22:52:44.687491 7f89eced9780 20 librbd: create 0x7fff8f7b7800
name = img-003 size = 536870912 old_format = 1 features = 0 order = 22
stripe_unit = 0 stripe_count = 0
2015-11-12 22:52:44.687653 7f89eced9780  1 -- 192.168.90.253:0/1006121 -->
192.168.90.253:6800/5472 -- osd_op(client.34321.0:1 img-003.rbd [stat]
2.8a047315 ack+read+known_if_redirected e37) v5 -- ?+0 0x28513d0 con
0x285
2015-11-12 22:52:44.688928 7f89e066b700  1 -- 192.168.90.253:0/1006121 <==
osd.1 192.168.90.253:6800/5472 1  osd_op_reply(1 img-003.rbd [stat]
v0'0 uv0 ack = -2 ((2) No such file or directory)) v6  178+0+0
(3550830125 0 0) 0x7f89cae0 con 0x285
2015-11-12 22:52:44.689090 7f89eced9780  1 -- 192.168.90.253:0/1006121 -->
192.168.90.253:6801/5344 -- osd_op(client.34321.0:2 rbd_id.img-003 [stat]
2.638c75a8 ack+read+known_if_redirected e37) v5 -- ?+0 0x2858330 con
0x2856f50
2015-11-12 22:52:44.690425 7f89e0469700  1 -- 192.168.90.253:0/1006121 <==
osd.0 192.168.90.253:6801/5344 1  osd_op_reply(2 rbd_id.img-003 [stat]
v0'0 uv0 ack = -2 ((2) No such file or directory)) v6  181+0+0
(1202435393 0 0) 0x7f89b8000ae0 con 0x2856f50
2015-11-12 22:52:44.690494 7f89eced9780  2 librbd: adding rbd image to
directory...
2015-11-12 22:52:44.690544 7f89eced9780  1 -- 192.168.90.253:0/1006121 -->
192.168.90.253:6801/5344 -- osd_op(client.34321.0:3 rbd_directory [tmapup
0~0] 2.30a98c1c ondisk+write+known_if_redirected e37) v5 -- ?+0 0x2858920
con 0x2856f50
2015-11-12 22:52:59.687447 7f89e4074700  1 -- 192.168.90.253:0/1006121 -->
192.168.90.253:6789/0 -- mon_subscribe({monmap=3+,osdmap=38}) v2 -- ?+0
0x7f89bab0 con 0x2843b90
2015-11-12 22:52:59.687472 7f89e4074700  1 -- 192.168.90.253:0/1006121 -->
192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bf40 con
0x2856f50
2015-11-12 22:52:59.687887 7f89e3873700  1 -- 192.168.90.253:0/1006121 <==
mon.0 192.168.90.253:6789/0 11  mon_subscribe_ack(300s) v1  20+0+0
(2867606018 0 0) 0x7f89d8001160 con 0x2843b90
2015-11-12 22:53:04.687593 7f89e4074700  1 -- 192.168.90.253:0/1006121 -->
192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0 con
0x2856f50
2015-11-12 22:53:09.687731 7f89e4074700  1 -- 192.168.90.253:0/1006121 -->
192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0 con
0x2856f50
2015-11-12 22:53:14.687844 7f89e4074700  1 -- 192.168.90.253:0/1006121 -->
192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0 con
0x2856f50
2015-11-12 22:53:19.687978 7f89e4074700  1 -- 192.168.90.253:0/1006121 -->
192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0 con
0x2856f50
2015-11-12 22:53:24.688116 7f89e4074700  1 -- 192.168.90.253:0/1006121 -->
192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0 con
0x2856f50
2015-11-12 22:53:29.688253 7f89e4074700  1 -- 192.168.90.253:0/1006121 -->
192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0 con
0x2856f50
2015-11-12 22:53:34.688389 7f89e4074700  1 -- 192.168.90.253:0/1006121 -->
192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0 con
0x2856f50
2015-11-12 22:53:39.688512 7f89e4074700  1 -- 192.168.90.253:0/1006121 -->
192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0 con
0x2856f50
2015-11-12 22:53:44.688636 7f89e4074700  1 -- 192.168.90.253:0/1006121 -->
192.168.90.253:6801/5344 -- ping magic: 0 v1 -- ?+0 0x7f89bab0 con
0x2856f50


Looks to me, we are keeping ping magic process, and no completed.

my ceph version is "ceph version 0.94.5
(9764da52395923e0b32908d83a9f7304401fee43)"

somebody can help me on this? Or still me to collect more debug information
for analyzing.

Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: segmentation fault when using librbd interface

2015-10-31 Thread min fang
Hi,my code get Segmentation fault when using librbd to do sync read  IO.
>From the trace, I can say there are several read IOs get successfully, but
the last read IO (2015-10-31 08:56:34.804383) can not be returned and my
code got segmentation fault.  I used rbd_read interface and malloc a buffer
for read data buffer.

Anybody can help this?  Thanks.


2015-10-31 08:56:34.750411 7f04bcbdc7c0 20 librbd: read 0x17896d0 off = 0
len = 4096
2015-10-31 08:56:34.750436 7f04bcbdc7c0 20 librbd: aio_read 0x17896d0
completion 0x1799440 [0,4096]
2015-10-31 08:56:34.750442 7f04bcbdc7c0 20 librbd: ictx_check 0x17896d0
2015-10-31 08:56:34.750451 7f04bcbdc7c0 20 librbd::AsyncOperation:
0x1799570 start_op
2015-10-31 08:56:34.750453 7f04bcbdc7c0 20 librbd:  oid
rb.0.8597.2ae8944a. 0~4096 from [0,4096]
2015-10-31 08:56:34.750457 7f04bcbdc7c0 10 librbd::ImageCtx:
prune_parent_extents image overlap 0, object overlap 0 from image extents []
2015-10-31 08:56:34.750462 7f04bcbdc7c0 20 librbd::AioRequest: send
0x1799c60 rb.0.8597.2ae8944a. 0~4096
2015-10-31 08:56:34.750498 7f04bcbdc7c0  1 -- 192.168.90.240:0/1006544 -->
192.168.90.253:6801/2041 -- osd_op(client.34253.0:92
rb.0.8597.2ae8944a. [sparse-read 0~4096] 2.7cf90552
ack+read+known_if_redirected e30) v5 -- ?+0 0x179b890 con 0x17877b0
2015-10-31 08:56:34.750526 7f04bcbdc7c0 20 librbd::AioCompletion:
AioCompletion::finish_adding_requests 0x1799440 pending 1
2015-10-31 08:56:34.780308 7f04b0bb5700  1 -- 192.168.90.240:0/1006544 <==
osd.0 192.168.90.253:6801/2041 5  osd_op_reply(92
rb.0.8597.2ae8944a. [sparse-read 0~4096] v0'0 uv8 ondisk = 0)
v6  198+0+4120 (3153096351 0 1287205638) 0x7f0494001ce0 con 0x17877b0
2015-10-31 08:56:34.780408 7f04b14b7700 20 librbd::AioRequest:
should_complete 0x1799c60 rb.0.8597.2ae8944a. 0~4096 r = 0
2015-10-31 08:56:34.780418 7f04b14b7700 20 librbd::AioRequest:
should_complete 0x1799c60 READ_FLAT
2015-10-31 08:56:34.780420 7f04b14b7700 20 librbd::AioRequest: complete
0x1799c60
2015-10-31 08:56:34.780421 7f04b14b7700 10 librbd::AioCompletion:
C_AioRead::finish() 0x1793710 r = 0
2015-10-31 08:56:34.780422 7f04b14b7700 10 librbd::AioCompletion:  got
{0=4096} for [0,4096] bl 4096
2015-10-31 08:56:34.780432 7f04b14b7700 20 librbd::AioCompletion:
AioCompletion::complete_request() 0x1799440 complete_cb=0x7f04ba2b1240
pending 1
2015-10-31 08:56:34.780434 7f04b14b7700 20 librbd::AioCompletion:
AioCompletion::finalize() 0x1799440 rval 4096 read_buf 0x179a5e0 read_bl 0
2015-10-31 08:56:34.780440 7f04b14b7700 20 librbd::AioCompletion:
AioCompletion::finalize() copied resulting 4096 bytes to 0x179a5e0
2015-10-31 08:56:34.780442 7f04b14b7700 20 librbd::AsyncOperation:
0x1799570 finish_op
2015-10-31 08:56:34.780766 7f04bcbdc7c0 20 librbd: read 0x17896d0 off =
4096 len = 4096
2015-10-31 08:56:34.780778 7f04bcbdc7c0 20 librbd: aio_read 0x17896d0
completion 0x1799440 [4096,4096]
2015-10-31 08:56:34.780781 7f04bcbdc7c0 20 librbd: ictx_check 0x17896d0
2015-10-31 08:56:34.780786 7f04bcbdc7c0 20 librbd::AsyncOperation:
0x1799570 start_op
2015-10-31 08:56:34.780788 7f04bcbdc7c0 20 librbd:  oid
rb.0.8597.2ae8944a. 4096~4096 from [0,4096]
2015-10-31 08:56:34.780790 7f04bcbdc7c0 10 librbd::ImageCtx:
prune_parent_extents image overlap 0, object overlap 0 from image extents []
2015-10-31 08:56:34.780793 7f04bcbdc7c0 20 librbd::AioRequest: send
0x179bcc0 rb.0.8597.2ae8944a. 4096~4096
2015-10-31 08:56:34.780813 7f04bcbdc7c0  1 -- 192.168.90.240:0/1006544 -->
192.168.90.253:6801/2041 -- osd_op(client.34253.0:93
rb.0.8597.2ae8944a. [sparse-read 4096~4096] 2.7cf90552
ack+read+known_if_redirected e30) v5 -- ?+0 0x179b5f0 con 0x17877b0
2015-10-31 08:56:34.780833 7f04bcbdc7c0 20 librbd::AioCompletion:
AioCompletion::finish_adding_requests 0x1799440 pending 1
2015-10-31 08:56:34.800847 7f04b0bb5700  1 -- 192.168.90.240:0/1006544 <==
osd.0 192.168.90.253:6801/2041 6  osd_op_reply(93
rb.0.8597.2ae8944a. [sparse-read 4096~4096] v0'0 uv8 ondisk =
0) v6  198+0+4120 (2253638743 0 3057087703) 0x7f0494001ce0 con 0x17877b0
2015-10-31 08:56:34.800947 7f04b14b7700 20 librbd::AioRequest:
should_complete 0x179bcc0 rb.0.8597.2ae8944a. 4096~4096 r = 0
2015-10-31 08:56:34.800956 7f04b14b7700 20 librbd::AioRequest:
should_complete 0x179bcc0 READ_FLAT
2015-10-31 08:56:34.800957 7f04b14b7700 20 librbd::AioRequest: complete
0x179bcc0
2015-10-31 08:56:34.800958 7f04b14b7700 10 librbd::AioCompletion:
C_AioRead::finish() 0x1796c90 r = 0
2015-10-31 08:56:34.800959 7f04b14b7700 10 librbd::AioCompletion:  got
{4096=4096} for [0,4096] bl 4096
2015-10-31 08:56:34.800963 7f04b14b7700 20 librbd::AioCompletion:
AioCompletion::complete_request() 0x1799440 complete_cb=0x7f04ba2b1240
pending 1
2015-10-31 08:56:34.800965 7f04b14b7700 20 librbd::AioCompletion:
AioCompletion::finalize() 0x1799440 rval 4096 read_buf 0x179a5e0 read_bl 0
2015-10-31 08:56:34.800969 7f04b14b7700 20 

Re: [ceph-users] segmentation fault when using librbd interface

2015-10-31 Thread min fang
this segmentation fault should happen in rbd_read function, I can see code
call this function, and then get segmentation fault, which means rbd_read
has not been completed successfully when  segmentation fault happened.

2015-11-01 10:34 GMT+08:00 min fang <louisfang2...@gmail.com>:

>
>
> Hi,my code get Segmentation fault when using librbd to do sync read  IO.
> From the trace, I can say there are several read IOs get successfully, but
> the last read IO (2015-10-31 08:56:34.804383) can not be returned and my
> code got segmentation fault.  I used rbd_read interface and malloc a buffer
> for read data buffer.
>
> Anybody can help this?  Thanks.
>
>
> 2015-10-31 08:56:34.750411 7f04bcbdc7c0 20 librbd: read 0x17896d0 off = 0
> len = 4096
> 2015-10-31 08:56:34.750436 7f04bcbdc7c0 20 librbd: aio_read 0x17896d0
> completion 0x1799440 [0,4096]
> 2015-10-31 08:56:34.750442 7f04bcbdc7c0 20 librbd: ictx_check 0x17896d0
> 2015-10-31 08:56:34.750451 7f04bcbdc7c0 20 librbd::AsyncOperation:
> 0x1799570 start_op
> 2015-10-31 08:56:34.750453 7f04bcbdc7c0 20 librbd:  oid
> rb.0.8597.2ae8944a. 0~4096 from [0,4096]
> 2015-10-31 08:56:34.750457 7f04bcbdc7c0 10 librbd::ImageCtx:
> prune_parent_extents image overlap 0, object overlap 0 from image extents []
> 2015-10-31 08:56:34.750462 7f04bcbdc7c0 20 librbd::AioRequest: send
> 0x1799c60 rb.0.8597.2ae8944a. 0~4096
> 2015-10-31 08:56:34.750498 7f04bcbdc7c0  1 -- 192.168.90.240:0/1006544
> --> 192.168.90.253:6801/2041 -- osd_op(client.34253.0:92
> rb.0.8597.2ae8944a. [sparse-read 0~4096] 2.7cf90552
> ack+read+known_if_redirected e30) v5 -- ?+0 0x179b890 con 0x17877b0
> 2015-10-31 08:56:34.750526 7f04bcbdc7c0 20 librbd::AioCompletion:
> AioCompletion::finish_adding_requests 0x1799440 pending 1
> 2015-10-31 08:56:34.780308 7f04b0bb5700  1 -- 192.168.90.240:0/1006544
> <== osd.0 192.168.90.253:6801/2041 5  osd_op_reply(92
> rb.0.8597.2ae8944a. [sparse-read 0~4096] v0'0 uv8 ondisk = 0)
> v6  198+0+4120 (3153096351 0 1287205638) 0x7f0494001ce0 con 0x17877b0
> 2015-10-31 08:56:34.780408 7f04b14b7700 20 librbd::AioRequest:
> should_complete 0x1799c60 rb.0.8597.2ae8944a. 0~4096 r = 0
> 2015-10-31 08:56:34.780418 7f04b14b7700 20 librbd::AioRequest:
> should_complete 0x1799c60 READ_FLAT
> 2015-10-31 08:56:34.780420 7f04b14b7700 20 librbd::AioRequest: complete
> 0x1799c60
> 2015-10-31 08:56:34.780421 7f04b14b7700 10 librbd::AioCompletion:
> C_AioRead::finish() 0x1793710 r = 0
> 2015-10-31 08:56:34.780422 7f04b14b7700 10 librbd::AioCompletion:  got
> {0=4096} for [0,4096] bl 4096
> 2015-10-31 08:56:34.780432 7f04b14b7700 20 librbd::AioCompletion:
> AioCompletion::complete_request() 0x1799440 complete_cb=0x7f04ba2b1240
> pending 1
> 2015-10-31 08:56:34.780434 7f04b14b7700 20 librbd::AioCompletion:
> AioCompletion::finalize() 0x1799440 rval 4096 read_buf 0x179a5e0 read_bl 0
> 2015-10-31 08:56:34.780440 7f04b14b7700 20 librbd::AioCompletion:
> AioCompletion::finalize() copied resulting 4096 bytes to 0x179a5e0
> 2015-10-31 08:56:34.780442 7f04b14b7700 20 librbd::AsyncOperation:
> 0x1799570 finish_op
> 2015-10-31 08:56:34.780766 7f04bcbdc7c0 20 librbd: read 0x17896d0 off =
> 4096 len = 4096
> 2015-10-31 08:56:34.780778 7f04bcbdc7c0 20 librbd: aio_read 0x17896d0
> completion 0x1799440 [4096,4096]
> 2015-10-31 08:56:34.780781 7f04bcbdc7c0 20 librbd: ictx_check 0x17896d0
> 2015-10-31 08:56:34.780786 7f04bcbdc7c0 20 librbd::AsyncOperation:
> 0x1799570 start_op
> 2015-10-31 08:56:34.780788 7f04bcbdc7c0 20 librbd:  oid
> rb.0.8597.2ae8944a. 4096~4096 from [0,4096]
> 2015-10-31 08:56:34.780790 7f04bcbdc7c0 10 librbd::ImageCtx:
> prune_parent_extents image overlap 0, object overlap 0 from image extents []
> 2015-10-31 08:56:34.780793 7f04bcbdc7c0 20 librbd::AioRequest: send
> 0x179bcc0 rb.0.8597.2ae8944a. 4096~4096
> 2015-10-31 08:56:34.780813 7f04bcbdc7c0  1 -- 192.168.90.240:0/1006544
> --> 192.168.90.253:6801/2041 -- osd_op(client.34253.0:93
> rb.0.8597.2ae8944a. [sparse-read 4096~4096] 2.7cf90552
> ack+read+known_if_redirected e30) v5 -- ?+0 0x179b5f0 con 0x17877b0
> 2015-10-31 08:56:34.780833 7f04bcbdc7c0 20 librbd::AioCompletion:
> AioCompletion::finish_adding_requests 0x1799440 pending 1
> 2015-10-31 08:56:34.800847 7f04b0bb5700  1 -- 192.168.90.240:0/1006544
> <== osd.0 192.168.90.253:6801/2041 6  osd_op_reply(93
> rb.0.8597.2ae8944a. [sparse-read 4096~4096] v0'0 uv8 ondisk =
> 0) v6  198+0+4120 (2253638743 0 3057087703) 0x7f0494001ce0 con 0x17877b0
> 2015-10-31 08:56:34.800947 7f04b14b7700 20 librbd::AioRequest:
> should_complete 0x179bcc0 rb.0.8597.2ae8944a. 4096~4096 r = 0
> 2015-10-31 08:56:34.800956 7f04

Re: [ceph-users] How ceph client abort IO

2015-10-20 Thread min fang
I want to abort and retry a IO if taking longer time not completed. Does
this make sense in Ceph? How ceph client handle longer timeout IOs? Just
wait until it returned, or other error recovery method can be used to
handle IO which can not be responsed in time.

Thanks.

2015-10-20 21:00 GMT+08:00 Jason Dillaman <dilla...@redhat.com>:

> There is no such interface currently on the librados / OSD side to abort
> IO operations.  Can you provide some background on your use-case for
> aborting in-flight IOs?
>
> --
>
> Jason Dillaman
>
>
> - Original Message -
>
> > From: "min fang" <louisfang2...@gmail.com>
> > To: ceph-users@lists.ceph.com
> > Sent: Monday, October 19, 2015 6:41:40 PM
> > Subject: [ceph-users] How ceph client abort IO
>
> > Can librbd interface provide abort api for aborting IO? If yes, can the
> abort
> > interface detach write buffer immediately? I hope can reuse the write
> buffer
> > quickly after issued the abort request, while not waiting IO aborted in
> osd
> > side.
>
> > thanks.
>
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How ceph client abort IO

2015-10-19 Thread min fang
Can librbd interface provide abort api for aborting IO? If yes, can the
abort interface detach write buffer immediately? I hope can reuse the write
buffer quickly after issued the abort request, while not waiting IO aborted
in osd side.

thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com