[ceph-users] what's the difference between pg and pgp?

2015-05-21 Thread baijia...@126.com





baijia...@126.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rgw-admin usage show does not seem to work right with start and end dates

2015-04-26 Thread baijia...@126.com
when I execute a put file operation at 17:10 of the local time.
and this time convert UTC time that  is 9:10.
and I execute radosgw-admin usage show --uid=test1 --show-log-entries=true 
--start-date=2015-04-27 09:00:00  but it does not seem to see anything.

when I  check the code, I find function 

I think that gmtime_r must change to localtime_r .  




baijia...@126.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] question about rgw create bucket

2015-03-01 Thread baijia...@126.com
when I create bucket, why rgw create 2 objects in the domain root pool.
and one object store struct RGWBucketInfo  and the other object store struct 
RGWBucketEntryPoint 

and when I delete the bucket , why rgw only delete one object.




baijia...@126.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW put file question

2015-02-04 Thread baijia...@126.com
when I put the same file with multi threads, sometimes  put file head oid 
ref.ioctx.operate(ref.oid, op);  return -ECANCELED. I think this is normal.
but fuction jump to done_cancel, and run the complete_update_index_cancel(or 
index_op.cancel() ), but  osd execute rgw_bucket_complete_op with 
CLS_RGW_OP_ADD and file size must be 0;
so at this moment bucket index record file size is zero. I think this is not 
right.




baijia...@126.com

From: Yehuda Sadeh-Weinraub
Date: 2015-02-05 12:06
To: baijiaruo
CC: ceph-users
Subject: Re: [ceph-users] RGW put file question


- Original Message -
 From: baijia...@126.com
 To: ceph-users ceph-users@lists.ceph.com
 Sent: Wednesday, February 4, 2015 5:47:03 PM
 Subject: [ceph-users] RGW put file question
 
 when I put file failed, and run the function 
 RGWRados::cls_obj_complete_cancel,
 why we use CLS_RGW_OP_ADD not use CLS_RGW_OP_CANCEL?
 why we set poolid is -1 and set epoch is 0?
 

I'm not sure, could very well be a bug. It should definitely be OP_CANCEL, but 
going back through the history it seems like it has been OP_ADD since at least 
argonaut. How did you notice it? It might explain a couple of issues that we've 
been seeing.

Yehuda___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW put file question

2015-02-04 Thread baijia...@126.com
when I put file failed, and run the function  
RGWRados::cls_obj_complete_cancel, 
why we use CLS_RGW_OP_ADD not use CLS_RGW_OP_CANCEL?
why we set poolid is -1 and set epoch is 0?



baijia...@126.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rgw-agent copy file failed

2015-01-18 Thread baijia...@126.com
when I write a file named 1234% in the master region, and rgw-agent send copy 
obj request which contains  x-amz-copy-source:nofilter_bucket_1/1234%   to 
the rep region fail 404 error;

I analysis that rgw-agent can't encode url 
x-amz-copy-source:nofilter_bucket_1/1234% , but rgw could decode 
x-amz-copy-source in the function RGWCopyObj::parse_copy_location.
so 1234% decode to 1234, and fail.

can you check this question?




baijia...@126.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rgw single bucket performance question

2015-01-14 Thread baijia...@126.com
I know single bucket has performance question from 
http://tracker.ceph.com/issues/8473 

I  attempt to modify crush map that put bucket.index pool to ssd. but 
performance is not good, and ssd performance never utilize.
this is op description,can you give me some  suggests to  improve performance: 

{ description: osd_op(client.193591.0:1185637 .dir.zone.192482.4 [] 
8.8cd2dea4 ack+ondisk+write e2007),
  received_at: 2015-01-14 15:05:15.888245,
  age: 357.891900,
  duration: 25.215902,
  type_data: [
commit sent; apply or cleanup,
{ client: client.193591,
  tid: 1185637},
[
{ time: 2015-01-14 15:05:15.888387,
  event: waiting_for_osdmap},
{ time: 2015-01-14 15:05:18.265420,
  event: reached_pg},
{ time: 2015-01-14 15:05:18.265457,
  event: started},
{ time: 2015-01-14 15:05:18.265462,
  event: started},
{ time: 2015-01-14 15:05:18.267986,
  event: waiting for subops from 43,109},
{ time: 2015-01-14 15:05:18.268131,
  event: commit_queued_for_journal_write},
{ time: 2015-01-14 15:05:18.268314,
  event: write_thread_in_journal_buffer},
{ time: 2015-01-14 15:05:18.268558,
  event: journaled_completion_queued},
{ time: 2015-01-14 15:05:18.281863,
  event: sub_op_commit_rec},
{ time: 2015-01-14 15:05:18.281903,
  event: sub_op_commit_rec},
{ time: 2015-01-14 15:05:18.299665,
  event: op_commit},
{ time: 2015-01-14 15:05:18.299731,
  event: commit_sent},
{ time: 2015-01-14 15:05:41.104126,
  event: op_applied},
{ time: 2015-01-14 15:05:41.104147,
  event: done}]]},




baijia...@126.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] question about S3 multipart upload ignores request headers

2015-01-09 Thread baijia...@126.com
I patch the http://tracker.ceph.com/issues/8452 
run s3 test suite and still is error;
err log: ERROR: failed to get obj attrs, 
obj=test-client.0-31zepqoawd8dxfa-212:_multipart_mymultipart.2/0IQGoJ7hG8ZtTyfAnglChBO79HUsjeC.meta
 ret=-2 

I found code that it may has problem:
when function exec ret = get_obj_attrs(store, s, meta_obj, attrs, NULL, NULL); 
 , whether should exec meta_obj.set_in_extra_data(true);  before it. 
because meta_obj is in the extra bucket.





baijia...@126.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] can I use librgw APIS ?

2014-11-24 Thread baijia...@126.com
can I use librgw APIS like librados? if I can, how to do it?




baijia...@126.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD start fail

2014-09-24 Thread baijia...@126.com

when I start all the osds, I find many osd start failed.   logs as follow:

osd/SnapMapper.cc: 270: FAILED assert(check(oid))

ceph version ()
 1: ceph-osd() [0x5e61c8]
 2: (remove_dir(CephContext*, ObjectStore*, SnapMapper*, OSDriver*, 
ObjectStore::Sequencer*, coll_t, std::tr1::shared_ptrDeletingState, 
ThreadPool::TPHandle)+0x3f0) [0x53e5f0]
 3: (OSD::RemoveWQ::_process(std::pairboost::intrusive_ptrPG, 
std::tr1::shared_ptrDeletingState , ThreadPool::TPHandle)+0x455) [0x54a8e5]
 4: (ThreadPool::WorkQueueValstd::pairboost::intrusive_ptrPG, 
std::tr1::shared_ptrDeletingState , std::pairboost::intrusive_ptrPG, 
std::tr1::shared_ptrDeletingState  ::_void_process(void*, 
ThreadPool::TPHandle)+0xfc) [0x5a448c]
 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0x551) [0x7f8d9d724701]
 6: (ThreadPool::WorkThread::entry()+0x10) [0x7f8d9d727740]
 7: /lib64/libpthread.so.0() [0x3aa48079d1]
 8: (clone()+0x6d) [0x3aa44e8b6d]

why?



baijia...@126.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] question about RGW

2014-09-10 Thread baijia...@126.com
when I read RGW code,  and can't  understand  master_ver  inside struct 
rgw_bucket_dir_header .
who can explain this struct , in especial master_ver and stats , thanks




baijia...@126.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] do RGW have billing feature? If have, how do we use it ?

2014-08-26 Thread baijia...@126.com





baijia...@126.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how radosgw recycle bucket index object and bucket meta object

2014-08-19 Thread baijia...@126.com
 I create a bucket and put some objects in the bucket。but I delete the all the 
objects and the bucket, why the bucket.meta object and bucket index object
are exist? when ceph recycle them?




baijia...@126.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how radosgw recycle bucket index object and bucket meta object

2014-08-19 Thread baijia...@126.com
thanks for you help.

for example: when I create bucket named  ,  and put a file named , 
size is 1M. 
so in the .rgw pool ,I see .bucket.meta.:default.4804.1  and  two 
objects,
in the .rgw.buckets.index pool, we see .dir.default.4804.1  one object,
in the .rgw.buckets pool, we see default.4804.1_ and 
default.4804.1__shadow  two objects
and then I delete file and bucket, I see that default.4804.1_ and  
two objects is deleted, 
 I adjust rgw_gc_obj_min_wait  600 seconds, rgw_gc_processor_max_time 300 
seconds ,rgw_gc_processor_period 300 seconds.
after ten minutes, I seedefault.4804.1__shadow  is deleted
but when do ceph delete .bucket.meta.:default.4804.1 and 
.dir.default.4804.1  ?



baijia...@126.com

From: Craig Lewis
Date: 2014-08-20 10:30
To: baijia...@126.com
CC: ceph-users
Subject: Re: [ceph-users] how radosgw recycle bucket index object and bucket 
meta object
My default, Ceph will wait two hours to garbage collect those RGW objects.


You can adjust that time by changing
rgw gc obj min wait 


See http://ceph.com/docs/master/radosgw/config-ref/ for the full list of 
configs.








On Tue, Aug 19, 2014 at 7:18 PM, baijia...@126.com baijia...@126.com wrote:

 I create a bucket and put some objects in the bucket。but I delete the all the 
objects and the bucket, why the bucket.meta object and bucket index object
are exist? when ceph recycle them?




baijia...@126.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW: Get object ops performance problem

2014-07-15 Thread baijia...@126.com
hi, everyone!

 I test RGW get obj ops, when I use 100 threads get one and the same  object , 
I find that performance is very good, meadResponseTime is 0.1s.
But when I  use 150 threads get one and the same object, performace is very 
bad, meadResponseTime is 1s.

and I observe the osd log and rgw log,
rgw log:
2014-07-15 10:36:42.999719 7f45596fb700  1 -- 10.0.1.61:0/1022376 -- 
10.0.0.21:6835/24201 -- osd_op(client.6167.0:22721 default.5632.8_ws1411.jpg 
[getxattrs,stat,read 0~524288] 4.5210f70b ack+read e657) 
2014-07-15 10:36:44.064720 7f467efdd700  1 -- 10.0.1.61:0/1022376 == osd.7 
10.0.0.21:6835/24201 22210  osd_op_reply(22721 

osd log:
10:36:43.001895 7f6cdb24c700  1 -- 10.0.0.21:6835/24201 == client.6167 
10.0.1.61:0/1022376 22436  osd_op(client.6167.0:22721 
default.5632.8_ws1411.jpg 
2014-07-15 10:36:43.031762 7f6cbf01f700  1 -- 10.0.0.21:6835/24201 -- 
10.0.1.61:0/1022376 -- osd_op_reply(22721 default.5632.8_ws1411.jpg 

so I think the problem is not happened in the osd, why osd send op replay in  
10:36:43.031762 , but rgw receive in 10:36:44.064720 ?





baijia...@126.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osd error log question

2014-07-09 Thread baijia...@126.com
I find osd log contain fault with nothing to send, going to standby ,what 
happened?




baijia...@126.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW performance test , put 30 thousands objects to one bucket, average latency 3 seconds

2014-07-03 Thread baijia...@126.com
hi, everyone

when I user rest bench testing RGW with cmd : rest-bench --access-key=ak 
--secret=sk  --bucket=bucket --seconds=360 -t 200  -b 524288  --no-cleanup 
write 

I found when RGW call the method bucket_prepare_op  is very slow. so I 
observed from 'dump_historic_ops',to see:
{ description: osd_op(client.4211.0:265984 .dir.default.4148.1 [call 
rgw.bucket_prepare_op] 3.b168f3d0 e37),
  received_at: 2014-07-03 11:07:02.465700,
  age: 308.315230,
  duration: 3.401743,
  type_data: [
commit sent; apply or cleanup,
{ client: client.4211,
  tid: 265984},
[
{ time: 2014-07-03 11:07:02.465852,
  event: waiting_for_osdmap},
{ time: 2014-07-03 11:07:02.465875,
  event: queue op_wq},
{ time: 2014-07-03 11:07:03.729087,
  event: reached_pg},
{ time: 2014-07-03 11:07:03.729120,
  event: started},
{ time: 2014-07-03 11:07:03.729126,
  event: started},
{ time: 2014-07-03 11:07:03.804366,
  event: waiting for subops from [19,9]},
{ time: 2014-07-03 11:07:03.804431,
  event: commit_queued_for_journal_write},
{ time: 2014-07-03 11:07:03.804509,
  event: write_thread_in_journal_buffer},
{ time: 2014-07-03 11:07:03.934419,
  event: journaled_completion_queued},
{ time: 2014-07-03 11:07:05.297282,
  event: sub_op_commit_rec},
{ time: 2014-07-03 11:07:05.297319,
  event: sub_op_commit_rec},
{ time: 2014-07-03 11:07:05.311217,
  event: op_applied},
{ time: 2014-07-03 11:07:05.867384,
  event: op_commit finish lock},
{ time: 2014-07-03 11:07:05.867385,
  event: op_commit},
{ time: 2014-07-03 11:07:05.867424,
  event: commit_sent},
{ time: 2014-07-03 11:07:05.867428,
  event: op_commit finish},
{ time: 2014-07-03 11:07:05.867443,
  event: done}]]}]}

so I find 2 performance degradation. one is from queue op_wq to reached_pg 
, anothor is from journaled_completion_queued to op_commit.
and I must stess that there are so many ops write to one bucket object, so how 
to reduce Latency ?





baijia...@126.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] why lock th whole osd handle thread

2014-07-03 Thread baijia...@126.com
when I see the function OSD::OpWQ::_process . I find pg lock locks the whole 
function. so when I  use multi-thread write the same object , so are they must 
serialize from osd handle thread to journal write thread ?



baijia...@126.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW performance test , put 30 thousands objects to one bucket, average latency 3 seconds

2014-07-03 Thread baijia...@126.com
I find that the function of OSD::OpWQ::_process  use pg-lock lock the whole 
function.so this mean that osd threads can't handle op which write for the same 
object.
though add log to the  ReplicatedPG::op_commit , I find pg lock cost long time 
sometimes. but I don't know where lock pg .
where lock pg for a long time?

thanks



baijia...@126.com

From: Gregory Farnum
Date: 2014-07-04 01:02
To: baijia...@126.com
CC: ceph-users
Subject: Re: [ceph-users] RGW performance test , put 30 thousands objects to 
one bucket, average latency 3 seconds
It looks like you're just putting in data faster than your cluster can
handle (in terms of IOPS).
The first big hole (queue_op_wq-reached_pg) is it sitting in a queue
and waiting for processing. The second parallel blocks are
1) write_thread_in_journal_buffer-journaled_completion_queued, and
that is again a queue while it's waiting to be written to disk,
2) waiting for subops from [19,9]-sub_op_commit_received(x2) is
waiting for the replica OSDs to write the transaction to disk.

You might be able to tune it a little, but right now bucket indices
live in one object, so every write has to touch the same set of OSDs
(twice! to mark an object as putting, and put). 2*3/360 = 166,
which is probably past what those disks can do, and artificially
increasing the latency.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Wed, Jul 2, 2014 at 11:24 PM, baijia...@126.com baijia...@126.com wrote:
 hi, everyone

 when I user rest bench testing RGW with cmd : rest-bench --access-key=ak
 --secret=sk  --bucket=bucket --seconds=360 -t 200  -b 524288  --no-cleanup
 write

 I found when RGW call the method bucket_prepare_op  is very slow. so I
 observed from 'dump_historic_ops',to see:
 { description: osd_op(client.4211.0:265984 .dir.default.4148.1 [call
 rgw.bucket_prepare_op] 3.b168f3d0 e37),
   received_at: 2014-07-03 11:07:02.465700,
   age: 308.315230,
   duration: 3.401743,
   type_data: [
 commit sent; apply or cleanup,
 { client: client.4211,
   tid: 265984},
 [
 { time: 2014-07-03 11:07:02.465852,
   event: waiting_for_osdmap},
 { time: 2014-07-03 11:07:02.465875,
   event: queue op_wq},
 { time: 2014-07-03 11:07:03.729087,
   event: reached_pg},
 { time: 2014-07-03 11:07:03.729120,
   event: started},
 { time: 2014-07-03 11:07:03.729126,
   event: started},
 { time: 2014-07-03 11:07:03.804366,
   event: waiting for subops from [19,9]},
 { time: 2014-07-03 11:07:03.804431,
   event: commit_queued_for_journal_write},
 { time: 2014-07-03 11:07:03.804509,
   event: write_thread_in_journal_buffer},
 { time: 2014-07-03 11:07:03.934419,
   event: journaled_completion_queued},
 { time: 2014-07-03 11:07:05.297282,
   event: sub_op_commit_rec},
 { time: 2014-07-03 11:07:05.297319,
   event: sub_op_commit_rec},
 { time: 2014-07-03 11:07:05.311217,
   event: op_applied},
 { time: 2014-07-03 11:07:05.867384,
   event: op_commit finish lock},
 { time: 2014-07-03 11:07:05.867385,
   event: op_commit},
 { time: 2014-07-03 11:07:05.867424,
   event: commit_sent},
 { time: 2014-07-03 11:07:05.867428,
   event: op_commit finish},
 { time: 2014-07-03 11:07:05.867443,
   event: done}]]}]}

 so I find 2 performance degradation. one is from queue op_wq to
 reached_pg , anothor is from journaled_completion_queued to op_commit.
 and I must stess that there are so many ops write to one bucket object, so
 how to reduce Latency ?


 
 baijia...@126.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW performance test , put 30 thousands objects to one bucket, average latency 3 seconds

2014-07-03 Thread baijia...@126.com
I put .rgw.buckets.index pool to SSD osd,bucket object must write to the SSD, 
and disk use ratio less than 50%. so I don't think disk is bottleneck




baijia...@126.com

From: baijia...@126.com
Date: 2014-07-04 01:29
To: Gregory Farnum
CC: ceph-users
Subject: Re: Re: [ceph-users] RGW performance test , put 30 thousands objects 
to one bucket, average latency 3 seconds
I find that the function of OSD::OpWQ::_process  use pg-lock lock the whole 
function.so this mean that osd threads can't handle op which write for the same 
object.
though add log to the  ReplicatedPG::op_commit , I find pg lock cost long time 
sometimes. but I don't know where lock pg .
where lock pg for a long time?

thanks



baijia...@126.com

From: Gregory Farnum
Date: 2014-07-04 01:02
To: baijia...@126.com
CC: ceph-users
Subject: Re: [ceph-users] RGW performance test , put 30 thousands objects to 
one bucket, average latency 3 seconds
It looks like you're just putting in data faster than your cluster can
handle (in terms of IOPS).
The first big hole (queue_op_wq-reached_pg) is it sitting in a queue
and waiting for processing. The second parallel blocks are
1) write_thread_in_journal_buffer-journaled_completion_queued, and
that is again a queue while it's waiting to be written to disk,
2) waiting for subops from [19,9]-sub_op_commit_received(x2) is
waiting for the replica OSDs to write the transaction to disk.

You might be able to tune it a little, but right now bucket indices
live in one object, so every write has to touch the same set of OSDs
(twice! to mark an object as putting, and put). 2*3/360 = 166,
which is probably past what those disks can do, and artificially
increasing the latency.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Wed, Jul 2, 2014 at 11:24 PM, baijia...@126.com baijia...@126.com wrote:
 hi, everyone

 when I user rest bench testing RGW with cmd : rest-bench --access-key=ak
 --secret=sk  --bucket=bucket --seconds=360 -t 200  -b 524288  --no-cleanup
 write

 I found when RGW call the method bucket_prepare_op  is very slow. so I
 observed from 'dump_historic_ops',to see:
 { description: osd_op(client.4211.0:265984 .dir.default.4148.1 [call
 rgw.bucket_prepare_op] 3.b168f3d0 e37),
   received_at: 2014-07-03 11:07:02.465700,
   age: 308.315230,
   duration: 3.401743,
   type_data: [
 commit sent; apply or cleanup,
 { client: client.4211,
   tid: 265984},
 [
 { time: 2014-07-03 11:07:02.465852,
   event: waiting_for_osdmap},
 { time: 2014-07-03 11:07:02.465875,
   event: queue op_wq},
 { time: 2014-07-03 11:07:03.729087,
   event: reached_pg},
 { time: 2014-07-03 11:07:03.729120,
   event: started},
 { time: 2014-07-03 11:07:03.729126,
   event: started},
 { time: 2014-07-03 11:07:03.804366,
   event: waiting for subops from [19,9]},
 { time: 2014-07-03 11:07:03.804431,
   event: commit_queued_for_journal_write},
 { time: 2014-07-03 11:07:03.804509,
   event: write_thread_in_journal_buffer},
 { time: 2014-07-03 11:07:03.934419,
   event: journaled_completion_queued},
 { time: 2014-07-03 11:07:05.297282,
   event: sub_op_commit_rec},
 { time: 2014-07-03 11:07:05.297319,
   event: sub_op_commit_rec},
 { time: 2014-07-03 11:07:05.311217,
   event: op_applied},
 { time: 2014-07-03 11:07:05.867384,
   event: op_commit finish lock},
 { time: 2014-07-03 11:07:05.867385,
   event: op_commit},
 { time: 2014-07-03 11:07:05.867424,
   event: commit_sent},
 { time: 2014-07-03 11:07:05.867428,
   event: op_commit finish},
 { time: 2014-07-03 11:07:05.867443,
   event: done}]]}]}

 so I find 2 performance degradation. one is from queue op_wq to
 reached_pg , anothor is from journaled_completion_queued to op_commit.
 and I must stess that there are so many ops write to one bucket object, so
 how to reduce Latency ?


 
 baijia...@126.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 回复: Re: Ask a performance question for the RGW

2014-07-01 Thread baijia...@126.com
I know FileStore.ondisk_finisher handle C_OSD_OpCommit , and from 
journaled_completion_queue to op_commit cost 3.6 seconds, maybe cost in the 
function of ReplicatedPG::op_commit .
 Through OpTracker , I find that ReplicatedPG::op_commit first lock pg, but it 
sometimes cost from 0.5  to 1 second , so the whole ondisk_finisher must wait 1 
second. How can cancel pg lock in the
ReplicatedPG::op_commit ?
thanks



baijia...@126.com

发件人: Guang Yang
发送时间: 2014-07-01 11:39
收件人: baijiaruo
抄送: ceph-users
主题: Re: [ceph-users] Ask a performance question for the RGW
On Jun 30, 2014, at 3:59 PM, baijia...@126.com wrote:

 Hello,
 thanks for you answer the question.
 But when there are less than 50 thousand objects, and latency is very big . I 
 see the write ops for the bucket index object., from 
 journaled_completion_queue to op_commit  cost 3.6 seconds,this mean that 
 from “writing journal finish” to  op_commit cost 3.6 seconds。
 so I can't understand this and what happened?
The operations updating the same bucket index object get serialized, one 
possibility is that those operation was hang there waiting other ops finishing 
their work.
  
 thanks
 baijia...@126.com
  
 发件人: Guang Yang
 发送时间: 2014-06-30 14:57
 收件人: baijiaruo
 抄送: ceph-users
 主题: Re: [ceph-users] Ask a performance question for the RGW
 Hello,
 There is a known limitation of bucket scalability, and there is a blueprint 
 tracking it - 
 https://wiki.ceph.com/Planning/Blueprints/Submissions/rgw%3A_bucket_index_scalability.
  
 At time being, I would recommend to do sharding at application level (create 
 multiple buckets) to workaround this limitation.
  
 Thanks,
 Guang
  
 On Jun 30, 2014, at 2:54 PM, baijia...@126.com wrote:
  
  
  hello, everyone!
  
  when I user rest bench test RGW performance and the cmd is:
  ./rest-bench --access-key=ak --secret=sk --bucket=bucket_name --seconds=600 
  -t 200 -b 524288 -no-cleanup write
  
  test result:
  Total time run: 362.962324 T
  otal writes made: 48189
  Write size: 524288
  Bandwidth (MB/sec): 66.383
  Stddev Bandwidth: 40.7776
  Max bandwidth (MB/sec): 173
  Min bandwidth (MB/sec): 0
  Average Latency: 1.50435
  Stddev Latency: 0.910731
  Max latency: 9.12276
  Min latency: 0.19867
  
  my environment is 4 host and 40 disk(osd)。 but test result is very bad, 
  average latency is 1.5 seconds 。and I find write obj metadate is very 
  slowly。because it puts so many object to one bucket, we know writing object 
  metadate can call method “bucket_prepare_op”,and test find this op is very 
  slowly。 I find the osd which contain bucket-obj。and see the 
  “bucket_prepare_op”by dump_historic_ops :
  { description: osd_op(client.4742.0:87613 .dir.default.4243.3 [call 
  rgw.bucket_prepare_op] 3.3670fe74 e317),
received_at: 2014-06-30 13:35:55.409597,
age: 51.148026,
duration: 4.130137,
type_data: [
  commit sent; apply or cleanup,
  { client: client.4742,
tid: 87613},
  [
  { time: 2014-06-30 13:35:55.409660,
event: waiting_for_osdmap},
  { time: 2014-06-30 13:35:55.409669,
event: queue op_wq},
  { time: 2014-06-30 13:35:55.896766,
event: reached_pg},
  { time: 2014-06-30 13:35:55.896793,
event: started},
  { time: 2014-06-30 13:35:55.896796,
event: started},
  { time: 2014-06-30 13:35:55.899450,
event: waiting for subops from [40,43]},
  { time: 2014-06-30 13:35:55.899757,
event: commit_queued_for_journal_write},
  { time: 2014-06-30 13:35:55.899799,
event: write_thread_in_journal_buffer},
  { time: 2014-06-30 13:35:55.899910,
event: journaled_completion_queued},
  { time: 2014-06-30 13:35:55.899936,
event: journal first callback},
  { time: 2014-06-30 13:35:55.899944,
event: queuing ondisk},
  { time: 2014-06-30 13:35:56.142104,
event: sub_op_commit_rec},
  { time: 2014-06-30 13:35:56.176950,
event: sub_op_commit_rec},
  { time: 2014-06-30 13:35:59.535301,
event: op_commit},
  { time: 2014-06-30 13:35:59.535331,
event: commit_sent},
  { time: 2014-06-30 13:35:59.539723,
event: op_applied},
  { time: 2014-06-30 13:35:59.539734,
event: done}]]},
  
  so why from journaled_completion_queued to op_commit is very slowly, 
  and what happened?
  thanks

[ceph-users] Ask a performance question for the RGW

2014-06-30 Thread baijia...@126.com
when I user rest bench test RGW performance with this argument:
   ./rest-bench --access-key=ak --secret=sk  --bucket=bucket_name --seconds=600 
-t 200 -b 524288 -no-cleanup write
test result:
Total time run: 362.962324
Total writes made:  48189
Write size: 524288
Bandwidth (MB/sec): 66.383 
Stddev Bandwidth:   40.7776
Max bandwidth (MB/sec): 173
Min bandwidth (MB/sec): 0
Average Latency:1.50435
Stddev Latency: 0.910731
Max latency:9.12276
Min latency:0.19867

my environment is 4 host and 40  disk(osd)。

 but test result is very bad, average latency is 1.5 seconds 。and I find write 
obj metadate is very slowly。because it  puts so many object to one bucket,
 we know writing object metadate can call method “bucket_prepare_op”,and test 
find this op is very slowly。

I find  the osd which contain bucket-obj。and   see the “bucket_prepare_op”by 
dump_historic_ops 
{ description: osd_op(client.4742.0:87615 .dir.default.4243.3 [call 
rgw.bucket_prepare_op] 3.3670fe74 e317),
  received_at: 2014-06-30 13:35:55.447192,
  age: 51.110431,
  duration: 4.092646,
  type_data: [
commit sent; apply or cleanup,
{ client: client.4742,
  tid: 87615},
[
{ time: 2014-06-30 13:35:55.447402,
  event: waiting_for_osdmap},
{ time: 2014-06-30 13:35:55.447409,
  event: queue op_wq},
{ time: 2014-06-30 13:35:55.902491,
  event: reached_pg},
{ time: 2014-06-30 13:35:55.902512,
  event: started},
{ time: 2014-06-30 13:35:55.902515,
  event: started},
{ time: 2014-06-30 13:35:55.911850,
  event: waiting for subops from [40,43]},
{ time: 2014-06-30 13:35:55.912052,
  event: commit_queued_for_journal_write},
{ time: 2014-06-30 13:35:55.912116,
  event: write_thread_in_journal_buffer},
{ time: 2014-06-30 13:35:55.924200,
  event: journaled_completion_queued},
{ time: 2014-06-30 13:35:55.924207,
  event: journal first callback},
{ time: 2014-06-30 13:35:55.924215,
  event: queuing ondisk},
{ time: 2014-06-30 13:35:56.142174,
  event: sub_op_commit_rec},
{ time: 2014-06-30 13:35:56.177000,
  event: sub_op_commit_rec},
{ time: 2014-06-30 13:35:59.535374,
  event: op_commit},
{ time: 2014-06-30 13:35:59.535404,
  event: commit_sent},
{ time: 2014-06-30 13:35:59.539765,
  event: op_applied},
{ time: 2014-06-30 13:35:59.539838,
  event: done}]]},
so why  from journaled_completion_queued to op_commit is very slowly, and 
what happened?

thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ask a performance question for the RGW

2014-06-30 Thread baijia...@126.com
hello, everyone!

when I user rest bench test RGW performance and the cmd is:
 ./rest-bench --access-key=ak --secret=sk --bucket=bucket_name --seconds=600 -t 
200 -b 524288 -no-cleanup write 

test result:
Total time run: 362.962324 T
otal writes made: 48189 
Write size: 524288 
Bandwidth (MB/sec): 66.383 
Stddev Bandwidth: 40.7776 
Max bandwidth (MB/sec): 173 
Min bandwidth (MB/sec): 0 
Average Latency: 1.50435 
Stddev Latency: 0.910731 
Max latency: 9.12276 
Min latency: 0.19867 

my environment is 4 host and 40 disk(osd)。 but test result is very bad, average 
latency is 1.5 seconds 。and I find write obj metadate is very slowly。because it 
puts so many object to one bucket, we know writing object metadate can call 
method “bucket_prepare_op”,and test find this op is very slowly。 I find the osd 
which contain bucket-obj。and see the “bucket_prepare_op”by dump_historic_ops :
{ description: osd_op(client.4742.0:87613 .dir.default.4243.3 [call 
rgw.bucket_prepare_op] 3.3670fe74 e317),
  received_at: 2014-06-30 13:35:55.409597,
  age: 51.148026,
 duration: 4.130137,
  type_data: [
commit sent; apply or cleanup,
{ client: client.4742,
  tid: 87613},
[
{ time: 2014-06-30 13:35:55.409660,
  event: waiting_for_osdmap},
{ time: 2014-06-30 13:35:55.409669,
  event: queue op_wq},
{ time: 2014-06-30 13:35:55.896766,
  event: reached_pg},
{ time: 2014-06-30 13:35:55.896793,
  event: started},
{ time: 2014-06-30 13:35:55.896796,
  event: started},
{ time: 2014-06-30 13:35:55.899450,
  event: waiting for subops from [40,43]},
{ time: 2014-06-30 13:35:55.899757,
  event: commit_queued_for_journal_write},
{ time: 2014-06-30 13:35:55.899799,
  event: write_thread_in_journal_buffer},
{ time: 2014-06-30 13:35:55.899910,
  event: journaled_completion_queued},
{ time: 2014-06-30 13:35:55.899936,
  event: journal first callback},
{ time: 2014-06-30 13:35:55.899944,
  event: queuing ondisk},
{ time: 2014-06-30 13:35:56.142104,
  event: sub_op_commit_rec},
{ time: 2014-06-30 13:35:56.176950,
  event: sub_op_commit_rec},
{ time: 2014-06-30 13:35:59.535301,
  event: op_commit},
{ time: 2014-06-30 13:35:59.535331,
  event: commit_sent},
{ time: 2014-06-30 13:35:59.539723,
  event: op_applied},
{ time: 2014-06-30 13:35:59.539734,
  event: done}]]},

so why from journaled_completion_queued to op_commit is very slowly, and 
what happened? 
thanks 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ask a performance question for the RGW

2014-06-30 Thread baijia...@126.com

hello, everyone!

when I user rest bench test RGW performance and the cmd is:
./rest-bench --access-key=ak --secret=sk --bucket=bucket_name --seconds=600 -t 
200 -b 524288 -no-cleanup write 

test result:
Total time run: 362.962324 T
otal writes made: 48189 
Write size: 524288 
Bandwidth (MB/sec): 66.383 
Stddev Bandwidth: 40.7776 
Max bandwidth (MB/sec): 173 
Min bandwidth (MB/sec): 0 
Average Latency: 1.50435 
Stddev Latency: 0.910731 
Max latency: 9.12276 
Min latency: 0.19867 

my environment is 4 host and 40 disk(osd)。 but test result is very bad, average 
latency is 1.5 seconds 。and I find write obj metadate is very slowly。because it 
puts so many object to one bucket, we know writing object metadate can call 
method “bucket_prepare_op”,and test find this op is very slowly。 I find the osd 
which contain bucket-obj。and see the “bucket_prepare_op”by dump_historic_ops :
{ description: osd_op(client.4742.0:87613 .dir.default.4243.3 [call 
rgw.bucket_prepare_op] 3.3670fe74 e317),
  received_at: 2014-06-30 13:35:55.409597,
  age: 51.148026,
  duration: 4.130137,
  type_data: [
commit sent; apply or cleanup,
{ client: client.4742,
  tid: 87613},
[
{ time: 2014-06-30 13:35:55.409660,
  event: waiting_for_osdmap},
{ time: 2014-06-30 13:35:55.409669,
  event: queue op_wq},
{ time: 2014-06-30 13:35:55.896766,
  event: reached_pg},
{ time: 2014-06-30 13:35:55.896793,
  event: started},
{ time: 2014-06-30 13:35:55.896796,
  event: started},
{ time: 2014-06-30 13:35:55.899450,
  event: waiting for subops from [40,43]},
{ time: 2014-06-30 13:35:55.899757,
  event: commit_queued_for_journal_write},
{ time: 2014-06-30 13:35:55.899799,
  event: write_thread_in_journal_buffer},
{ time: 2014-06-30 13:35:55.899910,
  event: journaled_completion_queued},
{ time: 2014-06-30 13:35:55.899936,
  event: journal first callback},
{ time: 2014-06-30 13:35:55.899944,
  event: queuing ondisk},
{ time: 2014-06-30 13:35:56.142104,
  event: sub_op_commit_rec},
{ time: 2014-06-30 13:35:56.176950,
  event: sub_op_commit_rec},
{ time: 2014-06-30 13:35:59.535301,
  event: op_commit},
{ time: 2014-06-30 13:35:59.535331,
  event: commit_sent},
{ time: 2014-06-30 13:35:59.539723,
  event: op_applied},
{ time: 2014-06-30 13:35:59.539734,
  event: done}]]},

so why from journaled_completion_queued to op_commit is very slowly, and 
what happened? 
thanks 




baijia...@126.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 回复: Re: Ask a performance question for the RGW

2014-06-30 Thread baijia...@126.com
Hello,
thanks for you answer the question.
But when there are less than 50 thousand objects, and latency is very big . I 
see the write ops for the bucket index object., from 
journaled_completion_queue to op_commit  cost 3.6 seconds,this mean that 
from “writing journal finish” to  op_commit cost 3.6 seconds。
so I can't understand this and what happened?

thanks



baijia...@126.com

发件人: Guang Yang
发送时间: 2014-06-30 14:57
收件人: baijiaruo
抄送: ceph-users
主题: Re: [ceph-users] Ask a performance question for the RGW
Hello,
There is a known limitation of bucket scalability, and there is a blueprint 
tracking it - 
https://wiki.ceph.com/Planning/Blueprints/Submissions/rgw%3A_bucket_index_scalability.

At time being, I would recommend to do sharding at application level (create 
multiple buckets) to workaround this limitation.

Thanks,
Guang

On Jun 30, 2014, at 2:54 PM, baijia...@126.com wrote:

  
 hello, everyone!
  
 when I user rest bench test RGW performance and the cmd is:
 ./rest-bench --access-key=ak --secret=sk --bucket=bucket_name --seconds=600 
 -t 200 -b 524288 -no-cleanup write
  
 test result:
 Total time run: 362.962324 T
 otal writes made: 48189
 Write size: 524288
 Bandwidth (MB/sec): 66.383
 Stddev Bandwidth: 40.7776
 Max bandwidth (MB/sec): 173
 Min bandwidth (MB/sec): 0
 Average Latency: 1.50435
 Stddev Latency: 0.910731
 Max latency: 9.12276
 Min latency: 0.19867
  
 my environment is 4 host and 40 disk(osd)。 but test result is very bad, 
 average latency is 1.5 seconds 。and I find write obj metadate is very 
 slowly。because it puts so many object to one bucket, we know writing object 
 metadate can call method “bucket_prepare_op”,and test find this op is very 
 slowly。 I find the osd which contain bucket-obj。and see the 
 “bucket_prepare_op”by dump_historic_ops :
 { description: osd_op(client.4742.0:87613 .dir.default.4243.3 [call 
 rgw.bucket_prepare_op] 3.3670fe74 e317),
   received_at: 2014-06-30 13:35:55.409597,
   age: 51.148026,
   duration: 4.130137,
   type_data: [
 commit sent; apply or cleanup,
 { client: client.4742,
   tid: 87613},
 [
 { time: 2014-06-30 13:35:55.409660,
   event: waiting_for_osdmap},
 { time: 2014-06-30 13:35:55.409669,
   event: queue op_wq},
 { time: 2014-06-30 13:35:55.896766,
   event: reached_pg},
 { time: 2014-06-30 13:35:55.896793,
   event: started},
 { time: 2014-06-30 13:35:55.896796,
   event: started},
 { time: 2014-06-30 13:35:55.899450,
   event: waiting for subops from [40,43]},
 { time: 2014-06-30 13:35:55.899757,
   event: commit_queued_for_journal_write},
 { time: 2014-06-30 13:35:55.899799,
   event: write_thread_in_journal_buffer},
 { time: 2014-06-30 13:35:55.899910,
   event: journaled_completion_queued},
 { time: 2014-06-30 13:35:55.899936,
   event: journal first callback},
 { time: 2014-06-30 13:35:55.899944,
   event: queuing ondisk},
 { time: 2014-06-30 13:35:56.142104,
   event: sub_op_commit_rec},
 { time: 2014-06-30 13:35:56.176950,
   event: sub_op_commit_rec},
 { time: 2014-06-30 13:35:59.535301,
   event: op_commit},
 { time: 2014-06-30 13:35:59.535331,
   event: commit_sent},
 { time: 2014-06-30 13:35:59.539723,
   event: op_applied},
 { time: 2014-06-30 13:35:59.539734,
   event: done}]]},
  
 so why from journaled_completion_queued to op_commit is very slowly, and 
 what happened?
 thanks
  
 baijia...@126.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com