Hi, guys
I analyze the architecture of the ceph souce code.
I know that, in order to keep journal atomic and consistent, the
journal write mode should be set with O_DSYNC or called fdatasync()
system call after every write operation. However, this kind of
operation is really killing the
ceph_sync_read and generic_file_read_iter() have already advanced the
IO iterator.
Signed-off-by: Yan, Zheng z...@redhat.com
---
fs/ceph/file.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 1c1df08..d7e0da8 100644
--- a/fs/ceph/file.c
+++
Hi Nicheal,
Not only recovery , IMHO the main purpose of ceph journal is to support
transaction semantics since XFS doesn't have that. I guess it can't be achieved
with pg_log/pg_info.
Thanks Regards
Somnath
-Original Message-
From: ceph-devel-ow...@vger.kernel.org
Hi Nicheal,
1. The main purpose of journal is provide transaction semantics (prevent
partially update). Peer is not enough for this need because ceph writes all
replica at the same time, so when crush, you have no idea about which replica
has right data. For example, say if we have 2 replica,
Both ceph_update_writeable_page and ceph_setattr will verify file size
with max size ceph supported.
There are two caller for ceph_update_writeable_page, ceph_write_begin and
ceph_page_mkwrite. For ceph_write_begin, we have already verified the size in
generic_write_checks of ceph_write_iter; for
2. Have you got any data to prove the O_DSYNC or fdatasync kill the
performance of journal? In our previous test, the journal SSD (use a
partition of a SSD as a journal for a particular OSD, and 4 OSD share a
same SSD) could reach its peak performance (300-400MB/s)
Hi,
I have done some
Hi,
I'm trying to understand the internals of RadosGW, on how
buckets/containers, objects are mapped back to rados objects. I couldn't
find any docs, however a previous mailing list discussion[1] explained
how an S3/Swift objects are cut into rados objects and about manifests. I was
able to
Adding ceph-devel
On 9/17/14, 1:27 AM, Loic Dachary l...@dachary.org wrote:
Could you resend with ceph-devel in cc ? It's better for archive purposes
;-)
On 17/09/2014 09:37, Johnu George (johnugeo) wrote:
Hi Sage,
I was looking at the crash that was reported in this mail
chain.
I
On 09/17/2014 09:20 AM, Alexandre DERUMIER wrote:
2. Have you got any data to prove the O_DSYNC or fdatasync kill the performance of
journal? In our previous test, the journal SSD (use a partition of a SSD as a
journal for a particular OSD, and 4 OSD share a same SSD) could reach its
peak
On Wed, Sep 17, 2014 at 7:39 AM, Abhishek L
abhishek.lekshma...@gmail.com wrote:
Hi,
I'm trying to understand the internals of RadosGW, on how
buckets/containers, objects are mapped back to rados objects. I couldn't
find any docs, however a previous mailing list discussion[1] explained
how
Hi,
If the number of replica desired is 1, then
https://github.com/ceph/ceph/blob/firefly/src/crush/CrushWrapper.h#L915
will be called with maxout = 1 and scratch will be maxout * 3. But if the rule
always selects 4 items, then it overflows. Is it what you also read ?
Cheers
On 17/09/2014
Hey everyone! We just posted the agenda for next week’s Ceph Day in San Jose:
http://ceph.com/cephdays/san-jose/
This Ceph Day will be held in a beautiful facility provided by our friends at
Brocade. We have a lot of great speakers from Brocade, Red Hat, Dell, Fujitsu,
HGST, and Supermicro,
Yehuda Sadeh writes:
On Wed, Sep 17, 2014 at 7:39 AM, Abhishek L
abhishek.lekshma...@gmail.com wrote:
Hi,
I'm trying to understand the internals of RadosGW, on how
buckets/containers, objects are mapped back to rados objects. I couldn't
find any docs, however a previous mailing list
Loic,
You are right. Are we planning to support configurations where
replica number is different from the number of osds selected from a rule?
If not, One solution is to add a validation check when a rule is activated
for a pool of a specific replica.
Johnu
On 9/17/14, 9:10 AM, Loic
On 17/09/2014 22:03, Johnu George (johnugeo) wrote:
Loic,
You are right. Are we planning to support configurations where
replica number is different from the number of osds selected from a rule?
I think crush should support it, yes. If a rule can provide 10 OSDs there is no
reason
On 09/17/2014 03:55 PM, Somnath Roy wrote:
Hi Sage,
We are experiencing severe librbd performance degradation in Giant over firefly
release. Here is the experiment we did to isolate it as a librbd problem.
1. Single OSD is running latest Giant and client is running fio rbd on top of
firefly
Mark,
All are running with concurrency 32.
Thanks Regards
Somnath
-Original Message-
From: Mark Nelson [mailto:mark.nel...@inktank.com]
Sent: Wednesday, September 17, 2014 1:59 PM
To: Somnath Roy; ceph-devel@vger.kernel.org
Subject: Re: severe librbd performance degradation in Giant
But, this time is ~10X degradation :-(
--
From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag]
FWIW, the journal will coalesce writes quickly when there are many
concurrent 4k client writes. Once you hit around 8 4k IOs per OSD, the
journal will start coalescing. For say 100-150 IOPs (what a spinning
disk can handle), expect around 9ish 100KB journal writes (with padding
and
On 09/17/2014 01:55 PM, Somnath Roy wrote:
Hi Sage,
We are experiencing severe librbd performance degradation in Giant over firefly
release. Here is the experiment we did to isolate it as a librbd problem.
1. Single OSD is running latest Giant and client is running fio rbd on top of
firefly
I set the following in the client side /etc/ceph/ceph.conf where I am running
fio rbd.
rbd_cache_writethrough_until_flush = false
But, no difference. BTW, I am doing Random read, not write. Still this setting
applies ?
Next, I tried to tweak the rbd_cache setting to false and I *got back* the
Any chance read ahead could be causing issues?
On 09/17/2014 04:29 PM, Somnath Roy wrote:
I set the following in the client side /etc/ceph/ceph.conf where I am running
fio rbd.
rbd_cache_writethrough_until_flush = false
But, no difference. BTW, I am doing Random read, not write. Still this
What was the io pattern? Sequential or random? For random a slowdown
makes sense (tho maybe not 10x!) but not for sequentail
s
On Wed, 17 Sep 2014, Somnath Roy wrote:
I set the following in the client side /etc/ceph/ceph.conf where I am running
fio rbd.
It's default read ahead setting. I am doing random read , so, I don't think
read ahead is the issue.
Also, in the cluster side, ceph -s is reporting same iops, so, ios are hitting
the cluster.
-Original Message-
From: Mark Nelson [mailto:mark.nel...@inktank.com]
Sent: Wednesday,
No, it's not merged yet. The ObjectCacher (which implements rbd and
ceph-fuse caching) has a global lock, which could be a bottleneck in
this case.
On 09/17/2014 02:34 PM, Mark Nelson wrote:
Any chance read ahead could be causing issues?
On 09/17/2014 04:29 PM, Somnath Roy wrote:
I set the
Sage,
It's a 4K random read.
Thanks Regards
Somnath
-Original Message-
From: Sage Weil [mailto:sw...@redhat.com]
Sent: Wednesday, September 17, 2014 2:36 PM
To: Somnath Roy
Cc: Josh Durgin; ceph-devel@vger.kernel.org
Subject: RE: severe librbd performance degradation in Giant
What was
Created a tracker for this.
http://tracker.ceph.com/issues/9513
Thanks Regards
Somnath
-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy
Sent: Wednesday, September 17, 2014 2:39 PM
To: Sage Weil
Cc: Josh
In such a case, we can initialize scratch array in
crush/CrushWrapper.h#L919 with maximum number of osds that can be
selected. Since we know the rule no, it should be possible to calculate
the maximum osds that can be selected.
Johnu
On 9/17/14, 1:11 PM, Loic Dachary l...@dachary.org wrote:
Josh/Sage,
I should mention that even after turning off rbd cache I am getting ~20%
degradation over Firefly.
Thanks Regards
Somnath
-Original Message-
From: Somnath Roy
Sent: Wednesday, September 17, 2014 2:44 PM
To: Sage Weil
Cc: Josh Durgin; ceph-devel@vger.kernel.org
Subject: RE:
When benching the crucial m550, I only see time to time (maybe each 30s,don't
remember exactly), ios slowing doing to 200 for 1 or 2 seconds then going up
to normal speed around 4000iops
Wow, that indicate m550 is busying with garbage collection , maybe just try to
overprovision a bit (say if
On Wed, Sep 17, 2014 at 5:26 PM, Chao Yu chao2...@samsung.com wrote:
Both ceph_update_writeable_page and ceph_setattr will verify file size
with max size ceph supported.
There are two caller for ceph_update_writeable_page, ceph_write_begin and
ceph_page_mkwrite. For ceph_write_begin, we have
On 09/17/2014 08:05 PM, Chen, Xiaoxi wrote:
When benching the crucial m550, I only see time to time (maybe each 30s,don't
remember exactly), ios slowing doing to 200 for 1 or 2 seconds then going up to
normal speed around 4000iops
Wow, that indicate m550 is busying with garbage collection ,
The rule has max_size, can we just use that value?
-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Johnu George (johnugeo)
Sent: Thursday, September 18, 2014 6:41 AM
To: Loic Dachary; ceph-devel
Subject: Re: [ceph-users]
According http://tracker.ceph.com/issues/9513, do you mean that rbd
cache will make 10x performance degradation for random read?
On Thu, Sep 18, 2014 at 7:44 AM, Somnath Roy somnath@sandisk.com wrote:
Josh/Sage,
I should mention that even after turning off rbd cache I am getting ~20%
Yes Haomai...
-Original Message-
From: Haomai Wang [mailto:haomaiw...@gmail.com]
Sent: Wednesday, September 17, 2014 7:28 PM
To: Somnath Roy
Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org
Subject: Re: severe librbd performance degradation in Giant
According
On Thu, 18 Sep 2014, Somnath Roy wrote:
Yes Haomai...
I would love to what a profiler says about the matter. There is going
to be some overhead on the client associated with the cache for a
random io workload, but 10x is a problem!
sage
-Original Message-
From: Haomai Wang
36 matches
Mail list logo