My 0.02, there are two kinds of balance, one for space utilization , another
for performance.
Now seems you will be good for the space utilization, but you might suffer a
bit for the performance as the density of disk increase.The new rack will hold
1/3 data by 1/5 disks, if we assume the
Pre-allocated the volume by "DD" across the entire RBD before you do any
performance test:).
In this case, you may want to re-create the RBD, pre-allocate and try again.
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf
Hi Mark,
The Async result in 128K drops quickly after some point, is that because
of the testing methodology?
Other conclusion looks to me like simple messenger + Jemalloc is the best
practice till now as it has the same performance as async but using much less
memory?
Hi Francois,
Actually you are discussing two separate questions here:)
1. in the 5 mons(2 in dc1, 2 in dc2, 1 in wan), can the monitor form a quorum?
How to offload the mon in WAN?
Yes and No, in one case, you lose any of your DC completely, that's
fine, the left 3 monitors could
Hi,
1. In short, the OSD need to heartbeat with up to #PG x (#Replica -1 ),
but actually will be much less since most of the peers are redundant.
For example, An OSD (say OSD 1) is holding 100 PGs, especially for some PGs,
say PG 1, OSD1 is the primary OSD of PG1, then OSD1 need to
to be
trimmed. I'm not a big fan of a --skip-trimming option as there is
the potential to leave some orphan objects that may not be cleaned up
correctly.
On Tue, Jan 6, 2015 at 8:09 AM, Jake Young jak3...@gmail.com wrote:
On Monday, January 5, 2015, Chen, Xiaoxi xiaoxi.c...@intel.com wrote
do you think?
From: Jake Young [mailto:jak3...@gmail.com]
Sent: Monday, January 5, 2015 9:45 PM
To: Chen, Xiaoxi
Cc: Edwin Peer; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] rbd resize (shrink) taking forever and a day
On Sunday, January 4, 2015, Chen, Xiaoxi
xiaoxi.c
Some low level caching might help, flashcache, dmcache,etc…
But that may hurt the reliability to some extent , and make it harder for
operator ☺
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Lindsay Mathieson
Sent: Monday, January 5, 2015 12:14 PM
To: Christian
Did you shut down the node with 2 mon?
I think it might be impossible to have redundancy with only 2 node, paxos
quorum is the reason:
Say you have N (N=2K+1) monitors, you always have a node(let's named it node A)
with majority number of MONs(= K+1), another node(node B) with minority number
Hi,
First of all, the data is safe since it's persistent in journal, if error
occurs on OSD data partition, replay the journal will get the data back.
And, there is a wbthrottle there, you can config how much data(ios, bytes,
inodes) you wants to remain in memory. A background thread
Hi Yang bin,
Not sure if you followed the right docs. I suspect you didn’t, because you
should use ceph-disk and specified a FS-Type in the command.
I think you might mislead by the quick
start(http://ceph.com/docs/master/start/quick-ceph-deploy/#create-a-cluster),
it use a directory
We have tested it for a while, basically it seems kind of stable but show
terrible bad performance.
This is not the fault of Ceph , but levelDB, or more generally, all K-V
storage with LSM design(RocksDB,etc), the LSM tree structure naturally
introduce very large write amplification 10X
had better off to optimize the key-value backend code
to support specified kind of load.
From: Haomai Wang [mailto:haomaiw...@gmail.com]
Sent: Monday, December 1, 2014 10:14 PM
To: Chen, Xiaoxi
Cc: Satoru Funai; ceph-us...@ceph.com
Subject: Re: [ceph-users] LevelDB support status is still
...@gmail.com]
Sent: Tuesday, December 2, 2014 1:27 PM
To: Chen, Xiaoxi
Cc: ceph-us...@ceph.com; Haomai Wang
Subject: Re: [ceph-users] LevelDB support status is still experimental on Giant?
Hi Xiaoxi,
Thanks for very useful information.
Can you share more details about Terrible bad performance is compare
Hi Simon
Do your workload has lots of RAW? Since Ceph has RW lock in each object,
so if you have a write to RBD and the following read happen to hit the same
object, the latency will be higher.
Another possibility is the OSD op_wq, it’s a priority queue but read and
write have same
Hi Chris,
I am not the expert of LIO but from your result, seems RBD/Ceph works
well(RBD on local system, no iSCSI) and LIO works well(Ramdisk (No RBD) - LIO
target) , and if you change LIO to use other interface (file, loopback) to
play with RBD, it also works well.
So
Hi Mark
It's client IOPS and we use replica = 2, journal and OSD are hosted in the
same SSDs so the real IOPS is 23K * 2 * 2 =90K, still far from HW limit (30K+
for a single DCS3700)
CPU % is ~62% in peak (2VM ), interrupt distributed.
An additional information, seems the cluster is in a
Could you show your cache tiering configuration? Especially this three
parameters.
ceph osd pool set hot-storage cache_target_dirty_ratio 0.4
ceph osd pool set hot-storage cache_target_full_ratio 0.8
ceph osd pool set {cachepool} target_max_bytes {#bytes}
From: ceph-users
Yes, but usually a system has several layer of error-detecting/recovering stuff
in different granularity.
Disk CRC works on Sector level, Ceph CRC mostly work on object level, and we
also have replication/erasure coding in system level.
The CRC in ceph mainly handle the case, imaging you have
The random may come from ceph trunks. For RBD, Ceph trunk the image to
4M(default) objects, for Rados bench , it already 4M objects if you didn't set
the parameters. So from XFS's view, there are lots of 4M files, in default,
with ag!=1 (allocation group, specified during mkfs, default seems to
发自我的 iPhone
在 2013-7-23,0:21,Gandalf Corvotempesta gandalf.corvotempe...@gmail.com 写道:
2013/7/22 Chen, Xiaoxi xiaoxi.c...@intel.com:
Imaging you have several writes have been flushed to journal and acked,but
not yet write to disk. Now the system crash by kernal panic or power
failure,you
Basically i think endurance is most important for a ceph journal,since the
workload for journal is full write,you can easily caculate how long your ssd
will burn out.. even we assume your ssd only run at 100MB/s in average,you will
burn out 8TB/day and 240TB/month
DCS 3500 is definitely not
PM
To: Chen, Xiaoxi
Cc: ceph-de...@vger.kernel.org; ceph-us...@ceph.com
Subject: Re: Any concern about Ceph on CentOS
Hi Xiaoxi,
we are really running Ceph on CentOS-6.4
(6 server nodes, 3 client nodes, 160 OSDs).
We put a 3.8.13 Kernel on top and installed the ceph-0.61.4 cluster with
mkcephfs
of the issue is that for the actual cluster it's self, it
should be ok.
I could be wrong here, but I thought the kernel module was only specifically
for mounting cephfs (And even then, there's a fuse module that you *can* use
anyway)
On 07/17/2013 11:18 AM, Chen, Xiaoxi wrote:
Hi list,
I would
threads. This is still too high for 8 core or 16 core cpu/cpus and will waste a
lot of cycles in context switchinh.
发自我的 iPhone
在 2013-6-7,0:21,Gregory Farnum g...@inktank.com 写道:
On Thu, Jun 6, 2013 at 12:25 AM, Chen, Xiaoxi xiaoxi.c...@intel.com wrote:
Hi,
From the code, each pipe
-Original Message-
From: Gregory Farnum [mailto:g...@inktank.com]
Sent: 2013年6月4日 0:37
To: Chen, Xiaoxi
Cc: ceph-de...@vger.kernel.org; Mark Nelson (mark.nel...@inktank.com);
ceph-us...@ceph.com
Subject: Re: [ceph-users] Ceph killed by OS because of OOM under high load
On Mon, Jun 3, 2013 at 8:47 AM
my 0.02, you really dont need to wait for health_ok between your recovery
steps,just go ahead. Everytime a new map be generated and broadcasted,the old
map and in-progress recovery will be canceled
发自我的 iPhone
在 2013-6-2,11:30,Nigel Williams nigel.d.willi...@gmail.com 写道:
Could I have a
Cannot agree more,when I trying to promote ceph to internal state holder,they
always complaining the stability of ceph,especially when they are evaluating
ceph with high enough pressure, ceph cannot stay heathy during the test.
发自我的 iPhone
在 2013-5-29,19:13,Wolfgang Hennerbichler
Hi,
Can I assume i am safe without this patch if i don't use any rbd cache?
发自我的 iPhone
在 2013-5-29,16:00,Alex Bligh a...@alex.org.uk 写道:
On 28 May 2013, at 06:50, Wolfgang Hennerbichler wrote:
for anybody who's interested, I've packaged the latest qemu-1.4.2 (not 1.5,
it didn't
4103'5330 (3853'4329,4103'5330] local-les=4092 n=154 ec
=100 les/c 4092/4093 4091/4091/4034) [319,46] r=0 lpr=4091 mlcod 4103'5329
active+clean] do_op mode now rmw(wr=0)
-Original Message-
From: Sage Weil [mailto:s...@inktank.com]
Sent: 2013年5月15日 11:40
To: Chen, Xiaoxi
Cc: Mark Nelson
Thanks, but i am not quite understand how to determine weather monitor
overloaded? and if yes,will start several monitor help?
发自我的 iPhone
在 2013-5-15,23:07,Jim Schutt jasc...@sandia.gov 写道:
On 05/14/2013 09:23 PM, Chen, Xiaoxi wrote:
How responsive generally is the machine under load
Hi
We are suffering our OSD flipping between up and down ( OSD X be voted to
down due to 3 missing ping, and after a while it tells the monitor map xxx
wrongly mark me down ). Because we are running sequential write performance
test on top of RBDs, and the cluster network nics is really in
% io wait).Enabling jumbo frame **seems**
make things worth.(just feeling.no data supports)
发自我的 iPhone
在 2013-5-14,23:36,Mark Nelson mark.nel...@inktank.com 写道:
On 05/14/2013 10:30 AM, Sage Weil wrote:
On Tue, 14 May 2013, Chen, Xiaoxi wrote:
Hi
We are suffering our OSD flipping
related with CPU scheduler ? The
heartbeat thread (in busy OSD ) failed to get enough cpu cycle.
-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Sage Weil
Sent: 2013年5月15日 7:23
To: Chen, Xiaoxi
Cc: Mark Nelson; ceph-de
Are you using a partition as journal?
From: ceph-users-boun...@lists.ceph.com
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Aleksey Samarin
Sent: 2013年3月26日 20:45
To: ceph-us...@ceph.com
Subject: [ceph-users] Journal size
Hello everyone!
I have question about journal. Ceph cluster is
Hi Mark,
I think you are the right man for these questions :) I am really don't
understand how osd_client_message_size_cap , objecter_infilght_op_bytes/ops,
ms_dispatch_throttle_bytes works? And how they affect performance.
Especially ,the objecter_inflight_op_bytes seems be used
Rephrase it to make it more clear
From: ceph-users-boun...@lists.ceph.com
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Chen, Xiaoxi
Sent: 2013年3月25日 17:02
To: 'ceph-users@lists.ceph.com' (ceph-users@lists.ceph.com)
Cc: ceph-de...@vger.kernel.org
Subject: [ceph-users] Ceph Crach
Hi List,
I cannot start my monitor when I update my cluster to v0.59, pls note
that I am not trying to upgrade,but by reinstall the ceph software stack and
rerunning mkcephfs. I have seen that the monitor change a lot after 0.58, is
the mkcephfs still have bugs ?
Below is the log:
I think Josh may be the right man for this question ☺
To be more precious, I would like to add more words about the status:
1. We have configured “show_image_direct_url= Ture” in Glance, and from the
Cinder-volume’s log, we can make sure we have got a direct_url , for example.
image_id
Thanks josh,the problem is solved by updating ceph in the glance node.
发自我的 iPhone
在 2013-3-20,14:59,Josh Durgin josh.dur...@inktank.com 写道:
On 03/19/2013 11:03 PM, Chen, Xiaoxi wrote:
I think Josh may be the right man for this question ☺
To be more precious, I would like to add more words
For me,We have seem a supermicro machine,which is 2U with 2 CPU and 24 2.5 inch
sata/sas drives,together with 2 onboard 10Gb Nic. I think it's good enough for
both density and computing power.
To another end, we are also planning to evaluating small node for ceph,say a
ATOM with 2 /4 disks per
41 matches
Mail list logo