Hi Nick, Christian,
This is something we've discussed a bit but hasn't made it to the top of
the list.
I think having a single persistent copy on the client has *some* value,
although it's limited because its a single point of failure. The simplest
scenario would be to use it as a
Yes, good idea.
I was looking the «WBThrottle» feature, but go for logging instead.
Le mercredi 04 mars 2015 à 17:10 +0100, Alexandre DERUMIER a écrit :
Only writes ;)
ok, so maybe some background operations (snap triming, scrubing...).
maybe debug_osd=20 , could give you more logs ?
If I remember right, someone has done this on a live cluster without
any issues. I seem to remember that it had a fallback mechanism if the
OSDs couldn't be reached on the cluster network to contact them on the
public network. You could test it pretty easily without much impact.
Take one OSD that
Only writes ;)
Le mercredi 04 mars 2015 à 16:19 +0100, Alexandre DERUMIER a écrit :
The change is only on OSD (and not on OSD journal).
do you see twice iops for read and write ?
if only read, maybe a read ahead bug could explain this.
- Mail original -
De: Olivier Bonvalet
Only writes ;)
ok, so maybe some background operations (snap triming, scrubing...).
maybe debug_osd=20 , could give you more logs ?
- Mail original -
De: Olivier Bonvalet ceph.l...@daevel.fr
À: aderumier aderum...@odiso.com
Cc: ceph-users ceph-users@lists.ceph.com
Envoyé: Mercredi 4
That was my thought, yes - I found this blog that confirms what you are
saying I guess:
http://www.sebastien-han.fr/blog/2012/07/29/tip-ceph-public-slash-private-network-configuration/
I will do that... Thx
I guess it doesnt matter, since my Crush Map will still refernce old OSDs,
that are stoped
Hi! I seen the documentation
http://ceph.com/docs/master/start/hardware-recommendations/ but those
minimum requirements without some recommendations don't tell me much ...
So, from what i seen for mon and mds any cheap 6 core 16+ gb ram amd
would do ... what puzzles me is that per daemon
On Tue, Mar 3, 2015 at 9:26 AM, Garg, Pankaj
pankaj.g...@caviumnetworks.com wrote:
Hi,
I have ceph cluster that is contained within a rack (1 Monitor and 5 OSD
nodes). I kept the same public and private address for configuration.
I do have 2 NICS and 2 valid IP addresses (one internal only
Just to get more specific: the reason you can apparently write stuff
to a file when you can't write to the pool it's stored in is because
the file data is initially stored in cache. The flush out to RADOS,
when it happens, will fail.
It would definitely be preferable if there was some way to
On 03/04/2015 05:44 PM, Robert LeBlanc wrote:
If I remember right, someone has done this on a live cluster without
any issues. I seem to remember that it had a fallback mechanism if the
OSDs couldn't be reached on the cluster network to contact them on the
public network. You could test it
You will most likely have a very high relocation percentage. Backfills
always are more impactful on smaller clusters, but osd max backfills
should be what you need to help reduce the impact. The default is 10,
you will want to use 1.
I didn't catch which version of Ceph you are running, but I
If the data have been replicated to new OSDs, it will be able to
function properly even them them down or only on the public network.
On Wed, Mar 4, 2015 at 9:49 AM, Andrija Panic andrija.pa...@gmail.com wrote:
I guess it doesnt matter, since my Crush Map will still refernce old OSDs,
that are
On Wed, 4 Mar 2015, Adrian Sevcenco wrote:
Hi! I seen the documentation
http://ceph.com/docs/master/start/hardware-recommendations/ but those
minimum requirements without some recommendations don't tell me much ...
So, from what i seen for mon and mds any cheap 6 core 16+ gb ram amd
would
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
John Spray
Sent: 04 March 2015 11:34
To: Nick Fisk; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Persistent Write Back Cache
On 04/03/2015 08:26, Nick Fisk wrote:
To illustrate the difference a
Thx Wido, I needed this confirmations - thanks!
On 4 March 2015 at 17:49, Wido den Hollander w...@42on.com wrote:
On 03/04/2015 05:44 PM, Robert LeBlanc wrote:
If I remember right, someone has done this on a live cluster without
any issues. I seem to remember that it had a fallback
Thx again - I really appreciatethe help guys !
On 4 March 2015 at 17:51, Robert LeBlanc rob...@leblancnet.us wrote:
If the data have been replicated to new OSDs, it will be able to
function properly even them them down or only on the public network.
On Wed, Mar 4, 2015 at 9:49 AM, Andrija
On Wed, 4 Mar 2015, Thomas Lemarchand wrote:
Thanks to all Ceph developers for the good work !
I see some love given to CephFS. When will you consider CephFS to be
production ready ?
The key missing piece is fsck (check and repair). That's where our
efforts are focused now. I think
Hi for hardware, inktank have good guides here:
http://www.inktank.com/resource/inktank-hardware-selection-guide/
http://www.inktank.com/resource/inktank-hardware-configuration-guide/
ceph works well with multiple osd daemon (1 osd by disk),
so you should not use raid.
(xfs is the recommended
On 03/04/2015 05:34 AM, John Spray wrote:
On 04/03/2015 08:26, Nick Fisk wrote:
To illustrate the difference a proper write back cache can make, I put
a 1GB (512mb dirty threshold) flashcache in front of my RBD and
tweaked the flush parameters to flush dirty blocks at a large queue
depth. The
I guess it doesnt matter, since my Crush Map will still refernce old OSDs,
that are stoped (and cluster resynced after that) ?
I wanted to say: it doesnt matter (I guess?) that my Crush map is still
referencing old OSD nodes that are already stoped. Tired, sorry...
On 4 March 2015 at 17:48,
Hi Robert,
I already have this stuff set. CEph is 0.87.0 now...
Thanks, will schedule this for weekend, 10G network and 36 OSDs - should
move data in less than 8h per my last experineced that was arround8h, but
some 1G OSDs were included...
Thx!
On 4 March 2015 at 17:49, Robert LeBlanc
To expand upon this, the very nature and existence of Ceph is to replace RAID.
The FS itself replicates data and handles the HA functionality that you're
looking for. If you're going to build a single server with all those disks,
backed by a ZFS RAID setup, you're going to be much better suited
On 03/03/2015 03:28 PM, Ken Dreyer wrote:
On 03/03/2015 04:19 PM, Sage Weil wrote:
Hi,
This is just a heads up that we've identified a performance regression in
v0.80.8 from previous firefly releases. A v0.80.9 is working it's way
through QA and should be out in a few days. If you haven't
I'd like to see a Solaris client.
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Dennis
Chen
Sent: Wednesday, March 04, 2015 2:00 AM
To: ceph-devel; ceph-users; Sage Weil; Loic Dachary
Subject: [ceph-users] The project of ceph client file
Last night I blew away my previous ceph configuration (this environment is
pre-production) and have 0.87.1 installed. I've manually edited the
crushmap so it down looks like https://dpaste.de/OLEa
I currently have 144 OSDs on 8 nodes.
After increasing pg_num and pgp_num to a more suitable 1024
Oh duh… OK, then given a 4+4 erasure coding scheme, 14400/8 is 1800, so try
2048.
-don-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Don
Doerner
Sent: 04 March, 2015 12:14
To: Kyle Hutson; Ceph Users
Subject: Re: [ceph-users] New EC pool undersized
In this case,
I have been following ceph for a long time. I have yet to put it into
service, and I keep coming back as btrfs improves and ceph reaches
higher version numbers.
I am now trying ceph 0.93 and kernel 4.0-rc1.
Q1) Is it still considered that btrfs is not robust enough, and that
xfs should be used
On 03/03/2015 05:53 PM, Jason Dillaman wrote:
Your procedure appears correct to me. Would you mind re-running your cloned
image VM with the following ceph.conf properties:
[client]
rbd cache off
debug rbd = 20
log file = /path/writeable/by/qemu.$pid.log
If you recreate the issue, would you
Sorry, I missed your other questions, down at the bottom. See
herehttp://ceph.com/docs/master/rados/operations/placement-groups/ (look for
“number of replicas for replicated pools or the K+M sum for erasure coded
pools”) for the formula; 38400/8 probably implies 8192.
The thing is, you’ve got
So it sounds like I should figure out at 'how many nodes' do I need to
increase pg_num to 4096, and again for 8192, and increase those
incrementally when as I add more hosts, correct?
On Wed, Mar 4, 2015 at 3:04 PM, Don Doerner don.doer...@quantum.com wrote:
Sorry, I missed your other
That did it.
'step set_choose_tries 200' fixed the problem right away.
Thanks Yann!
On Wed, Mar 4, 2015 at 2:59 PM, Yann Dupont y...@objoo.org wrote:
Le 04/03/2015 21:48, Don Doerner a écrit :
Hmmm, I just struggled through this myself. How many racks do you have?
If not more than 8,
On 04/03/2015 20:27, Datatone Lists wrote:
I have been following ceph for a long time. I have yet to put it into
service, and I keep coming back as btrfs improves and ceph reaches
higher version numbers.
I am now trying ceph 0.93 and kernel 4.0-rc1.
Q1) Is it still considered that btrfs is not
The change is only on OSD (and not on OSD journal).
do you see twice iops for read and write ?
if only read, maybe a read ahead bug could explain this.
- Mail original -
De: Olivier Bonvalet ceph.l...@daevel.fr
À: aderumier aderum...@odiso.com
Cc: ceph-users ceph-users@lists.ceph.com
On 03/02/2015 04:16 AM, koukou73gr wrote:
Hello,
Today I thought I'd experiment with snapshots and cloning. So I did:
rbd import --image-format=2 vm-proto.raw rbd/vm-proto
rbd snap create rbd/vm-proto@s1
rbd snap protect rbd/vm-proto@s1
rbd clone rbd/vm-proto@s1 rbd/server
And then proceeded
It wouldn't let me simply change the pg_num, giving
Error EEXIST: specified pg_num 2048 = current 8192
But that's not a big deal, I just deleted the pool and recreated with 'ceph
osd pool create ec44pool 2048 2048 erasure ec44profile'
...and the result is quite similar: 'ceph status' is now
ceph
Hmmm, I just struggled through this myself. How many racks do you have? If
not more than 8, you might want to make your failure domain smaller? I.e.,
maybe host? That, at least, would allow you to debug the situation…
-don-
From: Kyle Hutson [mailto:kylehut...@ksu.edu]
Sent: 04 March, 2015
I can't help much on the MDS front, but here is some answers and my
view on some of it.
On Wed, Mar 4, 2015 at 1:27 PM, Datatone Lists li...@datatone.co.uk wrote:
I have been following ceph for a long time. I have yet to put it into
service, and I keep coming back as btrfs improves and ceph
Le 04/03/2015 21:48, Don Doerner a écrit :
Hmmm, I just struggled through this myself.How many racks do you
have?If not more than 8, you might want to make your failure domain
smaller?I.e., maybe host?That, at least, would allow you to debug the
situation…
-don-
Hello, I think I
My lowest level (other than OSD) is 'disktype' (based on the crushmaps at
http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/
) since I have SSDs and HDDs on the same host.
I just made that change (deleted the pool, deleted the profile, deleted the
crush
- Original Message -
From: Ben Hines bhi...@gmail.com
To: ceph-users ceph-users@lists.ceph.com
Sent: Wednesday, March 4, 2015 1:03:16 PM
Subject: [ceph-users] Hammer sharded radosgw bucket indexes question
Hi,
These questions were asked previously but perhaps lost:
We have
On Wed, Mar 4, 2015 at 4:43 PM, Lionel Bouton
lionel-subscript...@bouton.name wrote:
On 03/04/15 22:18, John Spray wrote:
On 04/03/2015 20:27, Datatone Lists wrote:
[...] [Please don't mention ceph-deploy]
This kind of comment isn't very helpful unless there is a specific
issue with
Hi Josh,
Thanks for taking a look at this. I 'm answering your questions inline.
On 03/04/2015 10:01 PM, Josh Durgin wrote:
[...]
And then proceeded to create a qemu-kvm guest with rbd/server as its
backing store. The guest booted but as soon as it got to mount the root
fs, things got weird:
On 03/04/15 22:50, Travis Rhoden wrote:
[...]
Thanks for this feedback. I share a lot of your sentiments,
especially that it is good to understand as much of the system as you
can. Everyone's skill level and use-case is different, and
ceph-deploy is targeted more towards PoC use-cases. It
On 03/04/2015 01:36 PM, koukou73gr wrote:
On 03/03/2015 05:53 PM, Jason Dillaman wrote:
Your procedure appears correct to me. Would you mind re-running your
cloned image VM with the following ceph.conf properties:
[client]
rbd cache off
debug rbd = 20
log file =
I don’t know – I am playing with crush; someday I may fully comprehend it. Not
today.
I think you have to look at it like this: if your possible failure domain
options are OSDs, hosts, racks, …, and you choose racks as your failure domain,
and you have exactly as many racks as your pool size
Hello Nick,
On Wed, 4 Mar 2015 08:49:22 - Nick Fisk wrote:
Hi Christian,
Yes that's correct, it's on the client side. I don't see this much
different to a battery backed Raid controller, if you lose power, the
data is in the cache until power resumes when it is flushed.
If you are
Ah, nevermind - i had to pass the --bucket=bucketname argument.
You'd think the command would print an error if missing the critical argument.
-Ben
On Wed, Mar 4, 2015 at 6:06 PM, Ben Hines bhi...@gmail.com wrote:
One of the release notes says:
rgw: fix bucket removal with data purge (Yehuda
One of the release notes says:
rgw: fix bucket removal with data purge (Yehuda Sadeh)
Just tried this and it didnt seem to work:
bash-4.1$ time radosgw-admin bucket rm mike-cache2 --purge-objects
real0m7.711s
user0m0.109s
sys 0m0.072s
Yet the bucket was not deleted, nor purged:
Hi All,
Recently some folks showed interest in gathering pool distribution
statistics and I remembered I wrote a script to do that a while back.
It was broken due to a change in the ceph pg dump output format that was
committed a while back, so I cleaned the script up, added detection of
New issue created - http://tracker.ceph.com/issues/11027
Regards.
Italo Santos
http://italosantos.com.br/
On Tuesday, March 3, 2015 at 9:23 PM, Loic Dachary wrote:
Hi Yann,
That seems related to http://tracker.ceph.com/issues/10536 which seems to be
resolved. Could you create a new
Hi Mark,
Cool, that looks handy. Though it'd be even better if it could go a
step further and recommend re-weighting values to balance things out
(or increased PG counts where needed).
Cheers,
On 5 March 2015 at 15:11, Mark Nelson mnel...@redhat.com wrote:
Hi All,
Recently some folks showed
Hi everyone,
I'm proud to announce that DEB and RPM packages for Inkscope V1.1 are available
on github (https://github.com/inkscope/inkscope-packaging).
Inkscope has also its blog : http://inkscope.blogspot.fr.
You will find there how to install Inkscope on debian servers
Thank you Rober - I'm wondering when I do remove total of 7 OSDs from crush
map - weather that will cause more than 37% of data moved (80% or whatever)
I'm also wondering if the thortling that I applied is fine or not - I will
introduce the osd_recovery_delay_start 10sec as Irek said.
I'm just
Hi Christian,
Yes that's correct, it's on the client side. I don't see this much different
to a battery backed Raid controller, if you lose power, the data is in the
cache until power resumes when it is flushed.
If you are going to have the same RBD accessed by multiple servers/clients
then you
Hi,
Many thanks for the explanations.
I haven't used the nodcache option when mounting cephfs, it actually got
there by default
My mount command is/was :
# mount -t ceph 1.2.3.4:6789:/ /mnt -o name=puppet,secretfile=./puppet.secret
I don't know what causes this option to be default, maybe
Hi Luke,
May be you can set these flags:
ceph osd set nodown
ceph osd set noout
Regards
Sahana
On Wed, Mar 4, 2015 at 2:32 PM, Luke Kao luke@mycom-osi.com wrote:
Hello ceph community,
We need some immediate help that our cluster is in a very strange and bad
status after unexpected
Hi,
I'm having a live cluster with only public network (so no explicit network
configuraion in the ceph.conf file)
I'm wondering what is the procedure to implement dedicated
Replication/Private and Public network.
I've read the manual, know how to do it in ceph.conf, but I'm wondering
since this
On 04/03/2015 08:26, Nick Fisk wrote:
To illustrate the difference a proper write back cache can make, I put
a 1GB (512mb dirty threshold) flashcache in front of my RBD and
tweaked the flush parameters to flush dirty blocks at a large queue
depth. The same fio test (128k iodepth=1) now runs
Hi,
last saturday I upgraded my production cluster from dumpling to emperor
(since we were successfully using it on a test cluster).
A couple of hours later, we had falling OSD : some of them were marked
as down by Ceph, probably because of IO starvation. I marked the cluster
in «noout», start
An RBD image is split up into (by default 4MB) objects within the OSDs. When
you delete an RBD image, all the objects associated with the image are removed
from the OSDs. The objects are not securely erased from the OSDs if that is
what you are asking.
--
Jason Dillaman
Red Hat
Hi,
maybe this is related ?:
http://tracker.ceph.com/issues/9503
Dumpling: removing many snapshots in a short time makes OSDs go berserk
http://tracker.ceph.com/issues/9487
dumpling: snaptrimmer causes slow requests while backfilling.
osd_snap_trim_sleep not helping
Hi,
I'm trying cepfs and I have some problems. Here is the context:
All the nodes (in cluster and the clients) are Ubuntu 14.04 with a 3.16
kernel (after apt-get install linux-generic-lts-utopic reboot).
The cluster:
- one server with just one monitor daemon (RAM 2GB)
- 2 servers (RAM 24GB)
Thanks Alexandre.
The load problem is permanent : I have twice IO/s on HDD since firefly.
And yes, the problem hang the production at night during snap trimming.
I suppose there is a new OSD parameter which change behavior of the
journal, or something like that. But didn't find anything about
The load problem is permanent : I have twice IO/s on HDD since firefly.
Oh, permanent, that's strange. (If you don't see more traffic coming from
clients, I don't understand...)
do you see also twice ios/ ops in ceph -w stats ?
is the ceph health ok ?
- Mail original -
De: Olivier
Ceph health is OK yes.
The «firefly-upgrade-cluster-IO.png» graph is about IO stats seen by
ceph : there is no change between dumpling and firefly. The change is
only on OSD (and not on OSD journal).
Le mercredi 04 mars 2015 à 15:05 +0100, Alexandre DERUMIER a écrit :
The load problem is
65 matches
Mail list logo