Hi,
I'm just wondering if its possible (or planned to be implemented) a
way of configuring more than 2 levels of Tiering?
I'm thinking SSD Pool-Normal Replica Pool-EC Pool
We are looking at building a cluster to hold backups of VM's. As the
retention copies are effectively static data
Excellent thank you for your response
Sage Weil sweil@... writes:
Eventually, yes, but right now only 2 levels are supported.
There is a blueprint, see
http://wiki.ceph.com/Planning/Blueprints/Emperor/osd%3A_tiering%3A_objec
t_redirects
sage
runs, please let me know.
Nick
Nick Fisk
Technical Support Engineer
System Professional Ltd
tel: 01825 83
mob: 07711377522
fax: 01825 830001
mail: nick.f...@sys-pro.co.uk
web: www.sys-pro.co.ukhttp://www.sys-pro.co.uk
IT SUPPORT SERVICES | VIRTUALISATION | STORAGE | BACKUP AND DR
I've been looking at various categories of disks and how the
performance/reliability/cost varies.
There seems to be 5 main categories: (WD disks given as example)-
Budget (WD Green - 5400 no TLER)
Desktop Drives (WD Blue - /7200RPM no TLER)
NAS Drives (WD Red - 5400RPM TLER)
Enterprise Capacity
Whats everyones opinions on having redundant power supplies in your OSD
nodes?
One part of me says let Ceph do the redundancy and plan for the hardware to
fail, the other side says that they are probably worth having as they lessen
the chance of losing a whole node.
Considering they can
Hi Simon,
Have you tried using the Deadline scheduler on the Linux nodes? The deadline
scheduler prioritises reads over writes. I believe it tries to service all
reads within 500ms whilst writes can be delayed up to 5s.
I don’t the exact effect Ceph will have over the top of this, but
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Xu
(Simon) Chen
Sent: 31 October 2014 19:51
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] prioritizing reads over writes
I am already using deadline scheduler, with the default parameters:
read_expire=500
write_expire
October 2014 20:15
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] prioritizing reads over writes
We have SSD journals, backend disks are actually on SSD-fronted bcache devices
in writeback mode. The client VMs have rbd cache enabled too...
-Simon
On Fri, Oct 31
I have been thinking about the implications of losing the snapshot chain on
a RBD when doing export-diff-import-diff between two separate physical
locations. As I understand it, in this scenario when you take the first
snapshot again on the source, you would In effect end up copying the whole
RBD
Hi,
Does anyone know if there any statistics available specific to the cache
tier functionality, I'm thinking along the lines of cache hit ratios? Or
should I be pulling out the Read statistics for backing+cache pools and
assuming that if a read happens from the backing pool it was a miss and
-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Jean-Charles Lopez
Sent: 09 November 2014 01:43
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Cache Tier Statistics
Hi Nick
If my brain doesn't fail me you can try
ceph daemon osd.{id} perf dump
ceph report
Hi,
I'm just looking through the different methods of deploying Ceph and I
particularly liked the idea that the stackforge puppet module advertises of
using discover to automatically add new disks. I understand the principle of
how it should work; using ceph-disk list to find unknown disks, but I
:05
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Stackforge Puppet Module
Hi Nick,
The great thing about puppet-ceph's implementation on Stackforge is that it
is both unit and integration tested.
You can see the integration tests here:
https://github.com/ceph/puppet-ceph
Sent: 12 November 2014 14:25
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Stackforge Puppet Module
What comes to mind is that you need to make sure that you've cloned the git
repository to /etc/puppet/modules/ceph and not
/etc/puppet/modules/puppet-ceph.
Feel free to hop
Hi Robert,
I've just been testing your ceph check and I have made a small modification to
allow it to adjust itself to suit the autoscaling of the units Ceph outputs.
Here is the relevant section I have modified:-
if line[1] == 'TB':
used = saveint(line[0]) * 1099511627776
Has anyone tried applying this fix to see if it makes any difference?
https://github.com/ceph/ceph/pull/2374
I might be in a position in a few days to build a test cluster to test myself,
but was wondering if anyone else has had any luck with it?
Nick
-Original Message-
From:
Hi David,
Have you tried on a normal replicated pool with no cache? I've seen a number
of threads recently where caching is causing various things to block/hang.
It would be interesting to see if this still happens without the caching
layer, at least it would rule it out.
Also is there any sign
14:25
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Stackforge Puppet Module
What comes to mind is that you need to make sure that you've cloned the git
repository to /etc/puppet/modules/ceph and not
/etc/puppet/modules/puppet-ceph.
Feel free to hop on IRC to discuss about
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
David Moreau Simard
Sent: 19 November 2014 10:48
To: Ramakrishna Nishtala (rnishtal)
Cc: ceph-users@lists.ceph.com; Nick Fisk
Subject: Re: [ceph-users] Poor RBD performance as LIO iSCSI target
Rama,
Thanks
emit
}
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
David Moreau Simard
Sent: 20 November 2014 20:03
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Poor RBD performance as LIO iSCSI target
Nick,
Can you share more
The two numbers (ints) are meant to the ids of the pools you have created
for data and meta data.
Assuming you have already created the pools, run
ceph osd lspools
and use the numbers from there to create the FS
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
Hi Jay,
The way I would doit until Ceph supports HA iSCSI (see blueprint) would be to
configure a Ceph cluster as normal and then create RBD’s for your block storage.
I would then map these RBD’s on some “proxy” servers, these would be running in
an HA cluster with resource agents for
Hi Eneko,
There has been various discussions on the list previously as to the best SSD
for Journal use. All of them have pretty much come to the conclusion that the
Intel S3700 models are the best suited and in fact work out the cheapest in
terms of write durability.
Nick
-Original
Hi All,
Does anybody have any input on what the best ratio + total numbers of Data +
Coding chunks you would choose?
For example I could create a pool with 7 data chunks and 3 coding chunks and
get an efficiency of 70%, or I could create a pool with 17 data chunks and 3
coding chunks and
This is probably due to the Kernel RBD client not being recent enough. Have
you tried upgrading your kernel to a newer version? 3.16 should contain all
the relevant features required by Giant.
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
should
be released early next year.
Nick
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Antonio Messina
Sent: 05 December 2014 15:38
To: Nick Fisk
Cc: ceph-users@lists.ceph.com; Antonio Messina
Subject: Re: [ceph-users] Giant or Firefly
Sent: 05 December 2014 17:28
To: Nick Fisk; 'Ceph Users'
Subject: Re: [ceph-users] Erasure Encoding Chunks
On 05/12/2014 17:41, Nick Fisk wrote:
Hi Loic,
Thanks for your response.
The idea for this cluster will be for our VM Replica storage in our
secondary site. Initially we
-boun...@lists.ceph.com] On Behalf Of
David Moreau Simard
Sent: 05 December 2014 16:03
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Poor RBD performance as LIO iSCSI target
I've flushed everything - data, pools, configs and reconfigured the whole
thing.
I was particularly
? Or is it more about a capacity thing ?
Perhaps if someone else can chime in, I'm really curious.
--
David Moreau Simard
On Dec 6, 2014, at 11:18 AM, Nick Fisk n...@fisk.me.uk wrote:
Hi David,
Very strange, but I'm glad you managed to finally get the cluster working
normally. Thank you for posting
Hi Florent,
Journals don’t need to be very big, 5-10GB per OSD would normally be ample. The
key is that you get a SSD with high write endurance, this makes the Intel S3700
drives perfect for journal use.
In terms of how many OSD’s you can run per SSD, depends purely on how important
Hi Lindsay,
Ceph is really designed to scale across large amounts of OSD's and whilst it
will still function with only 2 OSD's, I wouldn't expect it to perform as
well as compared to a RAID 1 mirror with Battery Backed Cache.
I wouldn't recommend running the OSD's on USB, although it should work
: 29 December 2014 22:24
To: Nick Fisk
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] Improving Performance with more OSD's?
On Sun, 28 Dec 2014 04:08:03 PM Nick Fisk wrote:
If you can't add another full host, your best bet would be to add
another
2-3 disks to each server. This should give
I'm working on something very similar at the moment to present RBD's to ESXi
Hosts.
I'm going to run 2 or 3 VM's on the local ESXi storage to act as iSCSI
proxy nodes.
They will run a pacemaker HA setup with the RBD and LIO iSCSI resource
agents to provide a failover iSCSI target which maps back
until the Redhat patches make their way into
the kernel?
From: Jake Young [mailto:jak3...@gmail.com]
Sent: 23 January 2015 16:46
To: Zoltan Arnold Nagy
Cc: Nick Fisk; ceph-users
Subject: Re: [ceph-users] Ceph, LIO, VMWARE anyone?
Thanks for the feedback Nick and Zoltan,
I have been
Hi,
Just a couple of points, you might want to see if you can get a Xeon v3
board+CPU as they have more performance and use less power.
You can also get a SM 2U chassis which has 2x 2.5” disk slots at the rear, this
would allow you to have an extra 2x 3.5” disks in the front of the
Hi All,
Time for a little Saturday evening Ceph related quiz.
From this documentation page
http://ceph.com/docs/master/rados/operations/cache-tiering/
It seems to indicate that you can either flush/evict using relative sizing
(cache_target_dirty_ratio) or absolute sizing (target_max_bytes).
: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Sage Weil
Sent: 07 February 2015 20:57
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Cache Settings
On Sat, 7 Feb 2015, Nick Fisk wrote:
Hi All,
Time for a little Saturday evening Ceph related quiz
/eliminate a lot of
the troubles I have had with resources failing over.
Nick
From: Jake Young [mailto:jak3...@gmail.com]
Sent: 14 January 2015 12:50
To: Nick Fisk
Cc: Giuseppe Civitella; ceph-users
Subject: Re: [ceph-users] Ceph, LIO, VMWARE anyone?
Nick,
Where did you read that having
that helps
Nick
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Mike Christie
Sent: 28 January 2015 03:06
To: Zoltan Arnold Nagy; Jake Young
Cc: Nick Fisk; ceph-users
Subject: Re: [ceph-users] Ceph, LIO, VMWARE anyone?
Oh yeah, I am not completely sure
Hi Udo,
Lindsay did this for performance reasons so that the data is spread evenly
over the disks, I believe it has been accepted that the remaining 2tb on the
3tb disks will not be used.
Nick
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
-users-boun...@lists.ceph.com] On Behalf Of
Lindsay Mathieson
Sent: 05 January 2015 12:35
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Improving Performance with more OSD's?
On Mon, 5 Jan 2015 09:21:16 AM Nick Fisk wrote:
Lindsay did this for performance reasons so that the data is spread
Hi All,
Would anybody have an idea a) If it's possible and b) if it's a good idea
to have more EC chunks than the total number of hosts?
For instance if I wanted to have a k=6 m=2, but only across 4 hosts and I
wanted to be able to withstand 1 host failure and 1 disk failure(any host),
: 05 January 2015 17:38
To: Nick Fisk; ceph-us...@ceph.com
Subject: Re: [ceph-users] Erasure Encoding Chunks Number of Hosts
Hi Nick,
What about subdividing your hosts using containers ? For instance four
container per host on your four hosts which gives you 16 hosts. When you add
more hosts you
Hi Italo,
If you check for a post from me from a couple of days back, I have done exactly
this.
I created a k=5 m=3 over 4 hosts. This ensured that I could lose a whole host
and then an OSD on another host and the cluster was still fully operational.
I’m not sure if my method I used
Hi Giuseppe,
I am working on something very similar at the moment. I currently have it
working on some test hardware but seems to be working reasonably well.
I say reasonably as I have had a few instability’s but these are on the HA
side, the LIO and RBD side of things have been rock
Hi Mike,
I can also seem to reproduce this behaviour. If I shutdown a Ceph node, the
delay while Ceph works out that the OSD's are down seems to trigger similar
error messages. It seems fairly reliable that if a OSD is down for more than
10 seconds that LIO will have this problem.
Below is an
Hi David,
I have had a few weird issues when shutting down a node, although I can
replicate it by doing a “stop ceph-all” as well. It seems that OSD failure
detection takes a lot longer when a monitor goes down at the same time,
sometimes I have seen the whole cluster grind to a halt for
Of
Nick Fisk
Sent: 06 January 2015 07:43
To: 'Loic Dachary'; ceph-us...@ceph.com
Subject: Re: [ceph-users] Erasure Encoding Chunks Number of Hosts
Hi Loic,
That's an interesting idea, I suppose the same could probably be achieved by
just creating more Crush Host Buckets for each actual host
Hi Greg,
Thanks for your input and completely agree that we cannot expect developers
to fully document what impact each setting has on a cluster, particularly in
a performance related way
That said, if you or others could spare some time for a few pointers it
would be much appreciated and I will
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Gregory Farnum
Sent: 16 March 2015 17:33
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Cache Tier Flush = immediate base tier journal
sync?
On Wed, Mar 11
Hi Robin,
Just a few things to try:-
1. Increase the number of worker threads for tgt (it's a parameter of tgtd,
so modify however its being started)
2. Disable librbd caching in ceph.conf
3. Do you see the same performance problems exporting a krbd as a block
device via tgt?
Nick
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Mike Christie
Sent: 17 March 2015 21:27
To: Nick Fisk; 'Jake Young'
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] tgt and krbd
On 03/15/2015 08:42 PM, Mike Christie wrote
I think this could be part of what I am seeing. I found this post from back in
2003
http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/12083
Which seems to describe a work around for the behaviour to what I am seeing.
The constant small block IO I was seeing looks like it was either
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Bogdan SOLGA
Sent: 19 March 2015 20:51
To: ceph-users@lists.ceph.com
Subject: [ceph-users] PGs issue
Hello, everyone!
I have created a Ceph cluster (v0.87.1-1) using the info from the
I'm looking at trialling OSD's with a small flashcache device over them to
hopefully reduce the impact of metadata updates when doing small block io.
Inspiration from here:-
http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/12083
One thing I suspect will happen, is that when the OSD
I see the Problem, as your OSD's are only 8GB they have a zero weight, I think
the minimum size you can get away with is 10GB in Ceph as the size is measured
in TB and only has 2 decimal places.
For a work around try running :-
ceph osd crush reweight osd.X 1
for each osd, this will
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Burkhard Linke
Sent: 20 March 2015 09:09
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] OSD + Flashcache + udev + Partition uuid
Hi,
On 03/19/2015 10:41 PM, Nick Fisk wrote
Hi All,
I just created this little bash script to retrieve the /dev/disk/by-id
string for each OSD on a host. Our disks are internally mounted so have no
concept of drive bays, this should make it easier to work out what disk has
failed.
#!/bin/bash
DISKS=`ceph-disk list | grep ceph data`
Hi Mike,
I was using bs_aio with the krbd and still saw a small caching effect. I'm
not sure if it was on the ESXi or tgt/krbd page cache side, but I was
definitely seeing the IO's being coalesced into larger ones on the krbd
device in iostat. Either way, it would make me potentially nervous to
will be a lot
faster using a cache tier if the data resides in it.
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Steffen Winther
Sent: 09 March 2015 20:47
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] EC Pool and Cache Tier Tuning
Nick
: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of mad
Engineer
Sent: 09 March 2015 17:23
To: Nick Fisk
Cc: ceph-users
Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9
OSD with 3.16-3 kernel
Thank you Nick for explaining the problem with 4k writes.Queue
I'm not sure if it's something I'm doing wrong or just experiencing an
oddity, but when my cache tier flushes dirty blocks out to the base tier,
the writes seem to hit the OSD's straight away instead of coalescing in the
journals, is this correct?
For example if I create a RBD on a standard 3 way
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jake
Young
Sent: 06 March 2015 12:52
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] tgt and krbd
On Thursday, March 5, 2015, Nick Fisk n...@fisk.me.uk wrote:
Hi All,
Just a heads up after
.
Nick
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jake
Young
Sent: 06 March 2015 15:07
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] tgt and krbd
My initator is also VMware software iscsi. I had my tgt iscsi targets'
write-cache
Just tried cfq, deadline and noop which more or less all show identical results
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Alexandre DERUMIER
Sent: 06 March 2015 11:59
To: Nick Fisk
Cc: ceph-users
Subject: Re: [ceph-users] Strange krbd
Dryomov
Sent: 06 March 2015 15:09
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Strange krbd behaviour with queue depths
On Thu, Mar 5, 2015 at 8:17 PM, Nick Fisk n...@fisk.me.uk wrote:
I’m seeing a strange queue depth behaviour with a kernel mapped RBD,
librbd does
:10 AM, Nick Fisk wrote:
Just tried cfq, deadline and noop which more or less all show
identical results
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
Of Alexandre DERUMIER
Sent: 06 March 2015 11:59
To: Nick Fisk
Cc: ceph-users
Subject
Hi Jake,
Good to see it’s not just me.
I’m guessing that the fact you are doing 1MB writes means that the latency
difference is having a less noticeable impact on the overall write bandwidth.
What I have been discovering with Ceph + iSCSi is that due to all the extra
hops (client-iscsi
: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Somnath Roy
Sent: 06 March 2015 16:02
To: Alexandre DERUMIER; Nick Fisk
Cc: ceph-users
Subject: Re: [ceph-users] Strange krbd behaviour with queue depths
Nick,
I think this is because of the krbd you are using is using
Hi All,
I have been experimenting with EC pools and Cache Tiers to make them more
useful for more active data sets on RBD volumes and I thought I would share
my findings so far as they have made quite a significant difference.
My Ceph cluster comprises of 4 Nodes each with the following:-
You are hitting serial latency limits. For a 4kb sync write to happen it has
to:-
1. Travel across network from client to Primary OSD
2. Be processed by Ceph
3. Get Written to Pri OSD
4. Ack travels across network to client
At 4kb these 4 steps take up a very high percentage of the actual
Hi Stefan,
If the majority of your hot data fits on the cache tier you will see quite a
marked improvement in read performance and similar write performance
(assuming you would have had your hdds backed by SSD journals).
However for data that is not in the cache tier you will get 10-20% less
Am 11.03.2015 um 11:17 schrieb Nick Fisk:
Hi Nick,
Am 11.03.2015 um 10:52 schrieb Nick Fisk:
Hi Stefan,
If the majority of your hot data fits on the cache tier you will see
quite a marked improvement in read performance
I don't have writes ;-) just around 5%. 95% are writes
Hi Alexander,
Assuming the images would fit in the page cache of all your OSD nodes, you
would see a massive performance increase as reads would be coming straight
from ram.
But otherwise no, reads are not balanced across replica's, only the primary
one responds to reads. But don't forget a RBD
There's probably a middle ground where you get the best of both worlds.
Maybe 2-4 OSD's per compute node alongside dedicated Ceph nodes. That way
you get a bit of extra storage and can still use lower end CPU's, but don't
have to worry so much about resource contention.
-Original
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
John Spray
Sent: 04 March 2015 11:34
To: Nick Fisk; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Persistent Write Back Cache
On 04/03/2015 08:26, Nick Fisk wrote:
To illustrate the difference
Hi All,
Just a heads up after a day's experimentation.
I believe tgt with its default settings has a small write cache when
exporting a kernel mapped RBD. Doing some write tests I saw 4 times the
write throughput when using tgt aio + krbd compared to tgt with the builtin
librbd.
After
I can understand that this feature would prove more of a challenge
if you are using Qemu and RBD.
Nick
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Christian Balzer
Sent: 04 March 2015 08:40
To: ceph-users@lists.ceph.com
Cc: Nick Fisk
Subject
Hi All,
Thought I would just share this in case someone finds it useful.
I've just finished building our new Ceph cluster where the journals are
installed on the same SSD's as the OS. The SSD's have a MD raid partitions
for the OS and swap and the rest of the SSD's are used for individual
Hi All,
I've just finished building a new POC cluster comprised of the following:-
4 Hosts in 1 chassis
(http://www.supermicro.com/products/system/4U/F617/SYS-F617H6-FTPT_.cfm)
each with the following:-
2x Xeon 2620 v2 (2.1Ghz)
32GB Ram
2x Onboard 10GB-T into 10GB switches
10x 3TB
cursor, which suggests to me ceph-deploy overwrote the
1st sector of the disk, where grub normally resides.
Nick
-Original Message-
From: Christian Balzer [mailto:ch...@gol.com]
Sent: 01 March 2015 03:44
To: ceph-users@lists.ceph.com
Cc: Nick Fisk
Subject: Re: [ceph-users] Booting from
Hi Stephan,
I've just had a very similar problem today. It turned out that the problem was
the mgmt network which I use to manage the nodes is different to the public
network.
I had created host entries resolving to the mgmt network ip's, so when initial
create was running it was trying to
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Brendan Moloney
Sent: 23 March 2015 21:02
To: Noah Mehl
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] OSD + Flashcache + udev + Partition uuid
This would be in addition to
Just to add, the main reason it seems to make a difference is the metadata
updates which lie on the actual OSD. When you are doing small block writes,
these metadata updates seem to take almost as long as the actual data, so
although the writes are getting coalesced, the actual performance isn't
I'm probably going to get shot down for saying this...but here goes.
As a very rough guide, think of it more as you need around 10Mhz for every IO,
whether that IO is 4k or 4MB it uses roughly the same amount of CPU, as most of
the CPU usage is around ceph data placement rather than the actual
On Thursday, April 2, 2015, Nick Fisk n...@fisk.me.uk wrote:
I'm probably going to get shot down for saying this...but here goes.
As a very rough guide, think of it more as you need around 10Mhz for every
IO, whether that IO is 4k or 4MB it uses roughly the same amount of CPU, as
most
Hi Jeff,
I believe these are normal, they are just the connections IDLE timing out to
the OSD's because no traffic has flowed recently. They are probably a
symptom rather than a cause.
Nick
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Hi Frederic,
If you are using EC pools, the primary OSD requests the remaining shards of
the object from the other OSD's, reassembles it and then sends the data to
the client. The entire object needs to be reconstructed even for a small IO
operation, so 4kb reads could lead to quite a large IO
Hi Rafael,
Do you require a shared FS for these applications or would a block device
with a traditional filesystem be suitable?
If it is, then you could create separate pools with a RBD block device in
each.
Just out of interest what is the reason for separation, security or
performance?
Nick
-Original Message-
From: jdavidli...@gmail.com [mailto:jdavidli...@gmail.com] On Behalf Of J
David
Sent: 23 April 2015 21:22
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Having trouble getting good performance
On Thu, Apr 23, 2015 at 3:05 PM, Nick
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
J David
Sent: 23 April 2015 20:19
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Having trouble getting good performance
On Thu, Apr 23, 2015 at 3:05 PM, Nick
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Rafael Coninck Teigão
Sent: 23 April 2015 22:35
To: Nick Fisk; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Serving multiple applications with a single
cluster
Hi Nick,
Thanks
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
J David
Sent: 23 April 2015 17:51
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Having trouble getting good performance
On Wed, Apr 22, 2015 at 4:30 PM, Nick
: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Christian Eichelmann
Sent: 20 April 2015 14:41
To: Nick Fisk; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC
I'm using xfs on the rbd disks.
They are between 1 and 10TB in size
Ah ok, good point
What FS are you using on the RBD?
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Christian Eichelmann
Sent: 20 April 2015 13:16
To: Nick Fisk; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] 100% IO Wait with CEPH
Hi Christian,
A very non-technical answer but as the problem seems related to the RBD
client it might be worth trying the latest Kernel if possible. The RBD
client is Kernel based and so there may be a fix which might stop this from
happening.
Nick
-Original Message-
From: ceph-users
Hi David,
Thanks for posting those results.
From the Fio runs, I see you are getting around 200 iops at 128kb write io
size. I would imagine you should be getting somewhere around 200-300 iops
for the cluster you posted in the initial post, so it looks like its
performing about right.
200 iops
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
J David
Sent: 24 April 2015 15:40
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Having trouble getting good performance
On Fri, Apr 24, 2015 at 6:39 AM, Nick
Hi David,
I suspect you are hitting problems with sync writes, which Ceph isn't known
for being the fastest thing for.
I'm not a big expert on ZFS but I do know that a SSD ZIL is normally
recommended to allow fast sync writes. If you don't have this you are
waiting on Ceph to Ack the write
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
J David
Sent: 24 April 2015 18:41
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Having trouble getting good performance
On Fri, Apr 24, 2015 at 10:58 AM, Nick Fisk
1 - 100 of 685 matches
Mail list logo