I recently did some testing of a few SSDs and found some surprising, and some
not so surprising things:
1) performance varies wildly with firmware, especially with cheaper drives
2) performance varies with time - even with S3700 - slows down after ~40-80GB
and then creeps back up
3) cheaper
On 08 Jun 2015, at 10:07, Christian Balzer ch...@gol.com wrote:
On Mon, 8 Jun 2015 09:44:54 +0200 Jan Schermer wrote:
I recently did some testing of a few SSDs and found some surprising, and
some not so surprising things:
1) performance varies wildly with firmware, especially
On 08 Jun 2015, at 10:40, Christian Balzer ch...@gol.com wrote:
On Mon, 8 Jun 2015 10:12:02 +0200 Jan Schermer wrote:
On 08 Jun 2015, at 10:07, Christian Balzer ch...@gol.com wrote:
On Mon, 8 Jun 2015 09:44:54 +0200 Jan Schermer wrote:
I recently did some testing of a few SSDs
Isn’t the right parameter “network=writeback” for network devices like RBD?
Jan
On 08 Jun 2015, at 12:31, Andrey Korolyov and...@xdel.ru wrote:
On Mon, Jun 8, 2015 at 1:24 PM, Arnaud Virlet avir...@easter-eggs.com wrote:
Hi
Actually we use libvirt VM with ceph rbd pool for storage.
By
Durgin jdur...@redhat.com wrote:
On 06/01/2015 03:41 AM, Jan Schermer wrote:
Thanks, that’s it exactly.
But I think that’s really too much work for now, that’s why I really would
like to see a quick-win by using the local RBD cache for now - that would
suffice for most workloads (not too many
You say “even when the cluster is doing nothing” - Are you seeing those numbers
on a completely idle cluster?
Even SSDs can go to sleep, as can CPUs (throttle/sleep states), memory gets
swapped/paged out, tcp connections die, cache is empty... measuring a
completely idle cluster is not always
).paxos(paxos
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866327
lease_expire=2015-06-02 17:40:04.221316 has
v0 lc 36602135
On 02 Jun 2015, at 20:14, Jan Schermer j...@schermer.cz wrote:
Our mons just went into a logging frenzy.
We have 3 mons in the cluster
Dumpling
ceph-0.67.9-16.g69a99e6
I guess it shouldn’t be logging it at all?
Thanks
Jan
On 02 Jun 2015, at 20:42, Somnath Roy somnath@sandisk.com wrote:
Which code base are you using ?
-Original Message-
From: Jan Schermer [mailto:j...@schermer.cz]
Sent: Tuesday, June 02
peon recovering
peon active (and this is the madness)
It logs much less now, but the issue is still here…
Jan
On 02 Jun 2015, at 20:17, Jan Schermer j...@schermer.cz wrote:
Actually looks like it stopped, but here’s a more representative sample
(notice how often it logged this!)
v0 lc
Our mons just went into a logging frenzy.
We have 3 mons in the cluster, and they mostly log stuff like this
2015-06-02 18:00:48.749386 7f1c08c0d700 1 mon.node-10@1(peon).paxos(paxos
active c 36603331..36604063) is_readable now=2015-06-02 18:00:48.749389
lease_expire=2015-06-02
nothing to worry about other than log spam here which is fixed in the
latest build or you can fix it with debug mon = 0/0
Thanks Regards
Somnath
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan
Schermer
Sent: Tuesday, June 02
, 2015 at 8:47 AM, Jan Schermer j...@schermer.cz wrote:
Hidden danger in the default CRUSH rules is that if you lose 3 drives in 3
different hosts at the same time, you _will_ lose data, and not just some
data but possibly a piece of every rbd volume you have...
And the probability
, _every_ RBD
device will have random holes in their data, like swiss cheese.
BTW PGs can have stuck IOs without losing all three replicas -- see min_size.
Cheers, Dan
On Wed, Jun 10, 2015 at 10:20 AM, Jan Schermer j...@schermer.cz wrote:
When you increase the number of OSDs, you generaly would
Hidden danger in the default CRUSH rules is that if you lose 3 drives in 3
different hosts at the same time, you _will_ lose data, and not just some data
but possibly a piece of every rbd volume you have...
And the probability of that happening is sadly nowhere near zero. We had drives
drop out
awful
performance for some time (not those 8K you see though).
I think there was some kind of firmware process involved, I had to replace
the drive with a serious DC one.
El 23/06/15 a las 14:07, Jan Schermer escribió:
Yes, but that’s a separate issue :-)
Some drives are just slow (100 IOPS
instead of
cfq/deadline? I'm kind of surprised that you want/need to do any IO
scheduling when your journal and FileStore are on a good SSD.
Cheers, Dan
On Tue, Jun 23, 2015 at 4:57 PM, Jan Schermer j...@schermer.cz wrote:
For future generations
I persuaded CFQ to play nice in the end
of firmware process involved, I had to replace
the drive with a serious DC one.
El 23/06/15 a las 14:07, Jan Schermer escribió:
Yes, but that’s a separate issue :-)
Some drives are just slow (100 IOPS) for synchronous writes with no other
load.
The drives I’m testing have ~8K IOPS when
...@vanderster.com wrote:
On Tue, Jun 23, 2015 at 1:37 PM, Jan Schermer j...@schermer.cz wrote:
Yes, I use the same drive
one partition for journal
other for xfs with filestore
I am seeing slow requests when backfills are occuring - backfills hit the
filestore but slow requests are (most
On Tue, Jun 23, 2015 at 1:54 PM, Jan Schermer j...@schermer.cz wrote:
I only use SSDs, which is why I’m so surprised at the CFQ behaviour - the
drive can sustain tens of thousand of reads per second, thousands of writes
- yet saturating it with reads drops the writes to 10 IOPS - that’s mind
Thanks.
Nobody else knows anything about “cluster_snap”?
It is mentioned in the docs, but that’s all…
Jan
On 19 Jun 2015, at 12:49, Carsten Schmitt carsten.schm...@uni-hamburg.de
wrote:
Hi Jan,
On 06/18/2015 12:48 AM, Jan Schermer wrote:
1) Flags available in ceph osd set
I don’t run Ceph on btrfs, but isn’t this related to the btrfs snapshotting
feature ceph uses to ensure a consistent journal?
Jan
On 19 Jun 2015, at 14:26, Lionel Bouton lionel+c...@bouton.name
mailto:lionel+c...@bouton.name wrote:
On 06/19/15 13:42, Burkhard Linke wrote:
Forget the
is working well, but far more consistent
and likely faster than glibc.
Mark
On 06/24/2015 12:59 PM, Jan Schermer wrote:
We already had the migratepages in place before we disabled tcmalloc. It
didn’t do much.
Disabling tcmalloc made immediate difference but there were still spikes
Can you guess when we did that?
Still on dumpling, btw...
http://www.zviratko.net/link/notcmalloc.png
http://www.zviratko.net/link/notcmalloc.png
Jan___
ceph-users mailing list
ceph-users@lists.ceph.com
Thank you for your reply, answers below.
On 23 Jun 2015, at 13:15, Christian Balzer ch...@gol.com wrote:
Hello,
On Tue, 23 Jun 2015 12:53:45 +0200 Jan Schermer wrote:
I use CFQ but I have just discovered it completely _kills_ writes when
also reading (doing backfill for example
osd recovery op priority = 1
Hope that helps,
Dan
On Tue, Jun 23, 2015 at 12:53 PM, Jan Schermer j...@schermer.cz wrote:
I use CFQ but I have just discovered it completely _kills_ writes when also
reading (doing backfill for example)
If I run a fio job for synchronous writes
I use CFQ but I have just discovered it completely _kills_ writes when also
reading (doing backfill for example)
If I run a fio job for synchronous writes and at the same time run a fio job
for random reads, writes drop to 10 IOPS (oops!). Setting io priority with
ionice works nicely
...@lists.ceph.com] On Behalf Of Jan
Schermer
Sent: Wednesday, June 24, 2015 10:54 AM
To: Ben Hines
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Switching from tcmalloc
We did, but I don’t have the numbers. I have lots of graphs, though. We were
mainly trying to solve the CPU usage, since our
Just a quick guess - is it possible you ran out of file descriptors/connections
on the nodes or on a firewall on the way? I’ve seen this behaviour the other
way around - when too many RBD devices were connected to one client. It would
explain why it seems to work but hangs when the device is
if my
systems are using 80% cpu, if Ceph performance is better than when
it's using 20% cpu.
Can you share any scripts you have to automate these things? (NUMA
pinning, migratepages)
thanks,
-Ben
On Wed, Jun 24, 2015 at 10:25 AM, Jan Schermer j...@schermer.cz wrote:
There were
reducing locking for
memory allocations.
I would do some more testing along with what Ben Hines mentioned about
overall client performance.
-
Robert LeBlanc
GPG Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
On Wed, Jun 24, 2015 at 11:25 AM, Jan Schermer
using
tcmalloc? I've noticed that there is usually a good drop for us just by
restarting them. I don't think it is usually this drastic.
-
Robert LeBlanc
GPG Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
On Wed, Jun 24, 2015 at 2:08 AM, Jan Schermer wrote
Hi,
hoping someone can point me in the right direction.
Some of my OSDs have a larger CPU usage (and ops latencies) than others. If I
restart the OSD everything runs nicely for some time, then it creeps up.
1) most of my OSDs have ~40% CPU (core) usage (user+sys), some are closer to
80%.
On 11 Jun 2015, at 11:53, Henrik Korkuc li...@kirneh.eu wrote:
On 6/11/15 12:21, Jan Schermer wrote:
Hi,
hoping someone can point me in the right direction.
Some of my OSDs have a larger CPU usage (and ops latencies) than others. If
I restart the OSD everything runs nicely for some
but I don't see it will be that
significant.
Thanks Regards
Somnath
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan
Schermer
Sent: Thursday, June 11, 2015 5:57 AM
To: Dan van der Ster
Cc: ceph-users@lists.ceph.com
Subject: Re
.
Thanks Regards
Somnath
-Original Message-
From: Jan Schermer [mailto:j...@schermer.cz]
Sent: Thursday, June 11, 2015 12:10 PM
To: Somnath Roy
Cc: Dan van der Ster; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Restarting OSD leads to lower CPU usage
Hi,
I looked
, Jun 11, 2015 at 11:21 AM, Jan Schermer j...@schermer.cz wrote:
Hi,
hoping someone can point me in the right direction.
Some of my OSDs have a larger CPU usage (and ops latencies) than others. If
I restart the OSD everything runs nicely for some time, then it creeps up.
1) most of my OSDs
1) Flags available in ceph osd set are
pause|noup|nodown|noout|noin|nobackfill|norecover|noscrub|nodeep-scrub|notieragent
I know or can guess most of them (the docs are a “bit” lacking)
But with ceph osd set nodown” I have no idea what it should be used for - to
keep hammering a faulty OSD?
, with SATA drives
ST1000LM014-1EJ1 for data and for journal SSD INTEL SSDSC2BW12.
Regards,
Mateusz
From: Jan Schermer [mailto:j...@schermer.cz]
Sent: Wednesday, June 17, 2015 9:41 AM
To: Mateusz Skała
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Hardware cache settings
Cache on top of the data drives (not journal) will not help in most cases,
those writes are already buffered in the OS - so unless your OS is very light
on memory and flushing constantly it will have no effect, it just adds overhead
in case a flush comes. I haven’t tested this extensively with
On 16 Jun 2015, at 12:59, Gregory Farnum g...@gregs42.com wrote:
On Tue, Jun 16, 2015 at 11:53 AM, Jan Schermer j...@schermer.cz wrote:
Well, I see mons dropping out when deleting large amount of snapshots, and
it leats a _lot_ of CPU to delete them
Well, you're getting past my
Ping :-)
Looks like nobody is bothered by this, can I assume it is normal, doesn’t hurt
anything and will grow to millions in time?
Jan
On 15 Jun 2015, at 10:32, Jan Schermer j...@schermer.cz wrote:
Hi,
I have ~1800 removed_snaps listed in the output of “ceph osd dump
On Tue, Jun 16, 2015 at 1:41 AM, Jan Schermer j...@schermer.cz wrote:
Ping :-)
Looks like nobody is bothered by this, can I assume it is normal, doesn’t
hurt anything and will grow to millions in time?
Jan
On 15 Jun 2015, at 10:32, Jan Schermer j...@schermer.cz wrote:
Hi,
I have ~1800
Have you tried just running “sync;sync” on the originating node? Does that
achieve the same thing or not? (I guess it could/should).
Jan
On 16 Jun 2015, at 13:37, negillen negillen negil...@gmail.com wrote:
Thanks again,
even 'du' performance is terrible on node B (testing on a
Hi,
I have ~1800 removed_snaps listed in the output of “ceph osd dump”.
Is that allright? Any way to get rid of those? What’s the significance?
Thanks
Jan
___
ceph-users mailing list
ceph-users@lists.ceph.com
go up you can’t go down.
Jan
On 01 Jun 2015, at 10:57, huang jun hjwsm1...@gmail.com wrote:
hi,jan
2015-06-01 15:43 GMT+08:00 Jan Schermer j...@schermer.cz:
We had to disable deep scrub or the cluster would me unusable - we need to
turn it back on sooner or later, though.
With minimal
Thanks, that’s it exactly.
But I think that’s really too much work for now, that’s why I really would like
to see a quick-win by using the local RBD cache for now - that would suffice
for most workloads (not too many people run big databases on CEPH now, those
who do must be aware of this).
We had to disable deep scrub or the cluster would me unusable - we need to turn
it back on sooner or later, though.
With minimal scrubbing and recovery settings, everything is mostly good. Turned
out many issues we had were due to too few PGs - once we increased them from 4K
to 16K everything
Hi Nick,
responses inline, again ;-)
Thanks
Jan
On 27 May 2015, at 12:29, Nick Fisk n...@fisk.me.uk wrote:
Hi Jan,
Responses inline below
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Jan Schermer
Sent: 25 May 2015 21:14
Can you check the capacitor reading on the S3700 with smartctl ? This drive has
non-volatile cache which *should* get flushed when power is lost, depending on
what hardware does on reboot it might get flushed even when rebooting.
I just got this drive for testing yesterday and it’s a beast, but
On 28 May 2015, at 10:56, Christian Balzer ch...@gol.com wrote:
On Thu, 28 May 2015 10:32:18 +0200 Jan Schermer wrote:
Can you check the capacitor reading on the S3700 with smartctl ?
I suppose you mean this?
---
175 Power_Loss_Cap_Test 0x0033 100 100 010Pre-fail
Post the output from your “ceph osd tree”.
We were in a similiar situation, some of the OSDs were quite full while other
had 50% free. This is exactly why we increased the number of PGs, and it
helped to some degree.
Are all your hosts the same size? Does your CRUSH map select a host in the end?
AM, Jan Schermer j...@schermer.cz
mailto:j...@schermer.cz wrote:
I promised you all our scripts for automatic cgroup assignment - they are in
our production already and I just need to put them on github, stay tuned
tomorrow :-)
Jan
On 29 Jun 2015, at 19:41, Somnath Roy somnath
I think I posted my experience here ~1 month ago.
My advice for EnhanceIO: don’t use it.
But you didn’t exactly say what you want to cache - do you want to cache the
OSD filestore disks? RBD devices on hosts? RBD devices inside guests?
Jan
On 02 Jul 2015, at 11:29, Emmanuel Florac
Does anyone have a known-good set of parameters for ext4? I want to try it as
well but I’m a bit worried what happnes if I get it wrong.
Thanks
Jan
On 02 Jul 2015, at 09:40, Nick Fisk n...@fisk.me.uk wrote:
-Original Message-
From: ceph-users
And those disks are spindles?
Looks like there’s simply too few of there….
Jan
On 02 Jul 2015, at 13:49, German Anders gand...@despegar.com wrote:
output from iostat:
CEPHOSD01:
Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz
avgqu-sz await r_await
I promised you all our scripts for automatic cgroup assignment - they are in
our production already and I just need to put them on github, stay tuned
tomorrow :-)
Jan
On 29 Jun 2015, at 19:41, Somnath Roy somnath@sandisk.com wrote:
Presently, you have to do it by using tool like
Interesting. Any idea why degraded could be negative? :)
2015-07-02 17:27:11.551959 mon.0 [INF] pgmap v23198138: 36032 pgs: 35468
active+clean, 551 active+recovery_wait, 13 active+recovering; 13005 GB data,
48944 GB used, 21716 GB / 70660 GB avail; 11159KB/s rd, 129MB/s wr, 5059op/s;
What’s the value of /proc/sys/vm/min_free_kbytes on your system? Increase it to
256M (better do it if there’s lots of free memory) and see if it helps.
It can also be set too high, hard to find any formula how to set it correctly...
Jan
On 03 Jul 2015, at 10:16, Alex Gorbachev
.
Best Regards
-- Ray
On Tue, Jun 30, 2015 at 11:50 PM, Jan Schermer j...@schermer.cz
mailto:j...@schermer.cz wrote:
Hi all,
our script is available on GitHub
https://github.com/prozeta/pincpus https://github.com/prozeta/pincpus
I haven’t had much time to do a proper README, but I
Hi,
I have a full-ssd cluster on my hands, currently running Dumpling, with plans
to upgrade soon, and Openstack with RBD on top of that. While I am overall
quite happy with the performance (scales well accross clients), there is one
area where it really fails bad - big database workloads.
be at the top of anyone's
priority's.
Nick
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Jan Schermer
Sent: 25 May 2015 09:59
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Synchronous writes - tuning and some thoughts about
them
Turbo Boost will not hurt performance. Unless you have 100% load on all cores
it will actually improve performance (vastly, in terms of bursty workloads).
The issue you have could be related to CPU cores going to sleep mode.
Put intel_idle.max_cstate=3” on the kernel command line (I ran with =2
45DD A904 C70E E654 3BB2 FA62 B9F1
On Tue, May 26, 2015 at 5:53 AM, Lionel Bouton lionel+c...@bouton.name
wrote:
On 05/26/15 10:06, Jan Schermer wrote:
Turbo Boost will not hurt performance. Unless you have 100% load on all
cores it will actually improve performance (vastly, in terms
This is handled by the filesystem usually (or not, depending on what filesystem
you use).
When you hit a bad block you should just replace the drive - in case of a
spinning disk the damage is likely going to spread, in case of flash device
this error should have been prevented by firmware in
I tested and can recommend the Samsung 845 DC PRO (make sure it is DC PRO and
not just PRO or DC EVO!).
Those were very cheap but are out of stock at the moment (here).
Faster than Intels, cheaper, and slightly different technology (3D V-NAND)
which IMO makes them superior without needing many
I'm not sure if I missed that but are you testing in a VM backed by RBD device,
or using the device directly?
I don't see how blk-mq would help if it's not a VM, it just passes the request
to the underlying block device, and in case of RBD there is no real block
device from the host
I already evaluated EnhanceIO in combination with CentOS 6 (and backported 3.10
and 4.0 kernel-lt if I remember correctly).
It worked fine during benchmarks and stress tests, but once we run DB2 on it it
panicked within minutes and took all the data with it (almost literally - files
that werent
to a
configuration set. The name doesn't actually appear *in* the configuration
files. It stands to reason you should be able to rename the configuration
files on the client side and leave the cluster alone. It'd be with trying in
a test environment anyway.
-Erik
On Aug 18, 2015 7:59 AM, Jan Schermer
Reply in text
On 18 Aug 2015, at 12:59, Nick Fisk n...@fisk.me.uk wrote:
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Jan Schermer
Sent: 18 August 2015 11:50
To: Benedikt Fraunhofer given.to.lists.ceph-
users.ceph.com.toasta
On 18 Aug 2015, at 13:58, Nick Fisk n...@fisk.me.uk wrote:
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Jan Schermer
Sent: 18 August 2015 12:41
To: Nick Fisk n...@fisk.me.uk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph
Yes, writeback mode. I didn't try anything else.
Jan
On 18 Aug 2015, at 18:30, Alex Gorbachev a...@iss-integration.com wrote:
HI Jan,
On Tue, Aug 18, 2015 at 5:00 AM, Jan Schermer j...@schermer.cz wrote:
I already evaluated EnhanceIO in combination with CentOS 6 (and backported
3.10
This simply depends on what your workload is. I know this is a non-anwer for
you but that's how it is.
Databases are the worst, because they tend to hit the disks with every
transaction, and the transaction throughput is in direct proportion to the
number of IOPS you can get. And the number of
Thanks for the config,
few comments inline:, not really related to the issue
On 21 Aug 2015, at 15:12, J-P Methot jpmet...@gtcomm.net wrote:
Hi,
First of all, we are sure that the return to the default configuration
fixed it. As soon as we restarted only one of the ceph nodes with the
I never actually set up iSCSI with VMware, I just had to research various
VMware storage options when we had a SAN-probelm at a former job... But I can
take a look at it again if you want me to.
Is it realy deadlocked when this issue occurs?
What I think is partly responsible for this
Just to clarify - you unmounted the filesystem with umount -l? That almost
never a good idea, and it puts the OSD in a very unusual situation where IO
will actually work on the open files, but it can't open any new ones. I think
this would be enough to confuse just about any piece of software.
16:02, Jan Schermer wrote:
Shouldn't this:
cluster_network = fe80::%cephnet/64
be this:
cluster_network = fe80::/64
?
That won't work since the kernel doesn't know the scope. So %devname is
right, but Ceph can't parse it.
Although it sounds cool to run Ceph over link-local I don't
On 18 Aug 2015, at 16:44, Nick Fisk n...@fisk.me.uk wrote:
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Mark Nelson
Sent: 18 August 2015 14:51
To: Nick Fisk n...@fisk.me.uk; 'Jan Schermer' j...@schermer.cz
Cc: ceph-users
On 18 Aug 2015, at 17:57, Björn Lässig b.laes...@pengutronix.de wrote:
On 08/18/2015 04:32 PM, Jan Schermer wrote:
Should ceph care about what scope the address is in? We don't specify it for
ipv4 anyway, or is link-scope special in some way?
fe80::/64 is on every ipv6 enabled interface
This can be tuned in the iSCSI initiation on VMware - look in advanced settings
on your ESX hosts (at least if you use the software initiator).
Jan
On 23 Aug 2015, at 21:28, Nick Fisk n...@fisk.me.uk wrote:
Hi Alex,
Currently RBD+LIO+ESX is broken.
The problem is caused by the RBD
/2015 07:31 PM, Jan Schermer wrote:
Just to clarify - you unmounted the filesystem with umount -l? That almost
never a good idea, and it puts the OSD in a very unusual situation where IO
will actually work on the open files, but it can't open any new ones. I
think this would be enough to confuse
Are you sure it was because of configuration changes?
Maybe it was restarting the OSDs that fixed it?
We often hit an issue with backfill_toofull where the recovery/backfill
processes get stuck until we restart the daemons (sometimes setting
recovery_max_active helps as well). It still shows
to a NUMA
node.
Let me know how it works for you!
Jan
On 30 Jun 2015, at 10:50, Huang Zhiteng winsto...@gmail.com wrote:
On Tue, Jun 30, 2015 at 4:25 PM, Jan Schermer j...@schermer.cz
mailto:j...@schermer.cz wrote:
Not having OSDs and KVMs compete against each other is one thing
I don’t run Ceph on btrfs, but isn’t this related to the btrfs snapshotting
feature ceph uses to ensure a consistent journal?
Jan
On 19 Jun 2015, at 14:26, Lionel Bouton lionel+c...@bouton.name wrote:
On 06/19/15 13:42, Burkhard Linke wrote:
Forget the reply to the list...
Not at all.
We have this: http://ceph.com/docs/master/releases/
I would expect that whatever distribution I install Ceph LTS release on will
be supported for the time specified.
That means if I install Hammer on CentOS 6 now it will stay supported
until 3Q/2016.
Of course if in the meantime the
I understand your reasons, but dropping support for LTS release like this
is not right.
You should lege artis support every distribution the LTS release could have
ever been installed on - that’s what the LTS label is for and what we rely on
once we build a project on top of it
CentOS 6 in
It is possible I misunderstood Sage’s message - I apologize if that’s the case.
This is what made me uncertain:
- We would probably continue building hammer and firefly packages for
future bugfix point releases.
Decision for new releases (Infernalis, Jewel, K*) regarding distro support
should
On 31 Jul 2015, at 17:28, Haomai Wang haomaiw...@gmail.com wrote:
On Fri, Jul 31, 2015 at 5:47 PM, Jan Schermer j...@schermer.cz wrote:
I know a few other people here were battling with the occasional issue of
OSD being extremely slow when starting.
I personally run OSDs mixed with KVM
I remember reading that ScaleIO (I think?) does something like this by
regularly sending reports to a multicast group, thus any node with issues (or
just overload) is reweighted or avoided automatically on the client. OSD map is
the Ceph equivalent I guess. It makes sense to gather metrics and
May your bytes stay with you :)
Happy bofhday!
Jan
On 01 Aug 2015, at 00:10, Michael Kuriger mk7...@yp.com wrote:
Thanks Mark you too
Michael Kuriger
Sr. Unix Systems Engineer
* mk7...@yp.com |( 818-649-7235
On 7/31/15, 3:02 PM, ceph-users on behalf of Mark Nelson
Hi,
comments inline.
On 05 Aug 2015, at 05:45, Jevon Qiao qiaojianf...@unitedstack.com wrote:
Hi Jan,
Thank you for the detailed suggestion. Please see my reply in-line.
On 5/8/15 01:23, Jan Schermer wrote:
I think I wrote about my experience with this about 3 months ago, including
Hi,
is adjusting crush weight really a good solution for this? Crush weight out of
the box corresponds to OSD capacity in TB and this looks like a good “weight”
to me. The issue is not in a bucket having wrong weight, but somewhere else
depending on CRUSH.
We actually use “osd reweight” for
I remember when ceph.com was down a while ago - it hurts. Thank you for this.
Cloudflare works and should be free for the website itself.
Not sure how they handle caching of “larger” (not website) objects for
repositories etc, might be plug and play or might require integration with
their CDN.
I know a few other people here were battling with the occasional issue of OSD
being extremely slow when starting.
I personally run OSDs mixed with KVM guests on the same nodes, and was baffled
by this issue occuring mostly on the most idle (empty) machines.
Thought it was some kind of race
Hi,
if you look in the archive you'll see I posted something similiar about 2
months ago.
You can try something experimenting with
1) stock binaries - tcmalloc
2) LD_PRELOADed jemalloc
3) ceph recompiled with neither (glibc malloc)
4) ceph recompiled with jemalloc (?)
We simply recompiled ceph
Could someone clarify what the impact of this bug is?
We did increase pg_num/pgp_num and we are on dumpling (0.67.12 unofficial
snapshot).
Most of our clients are likely restarted already, but not all. Should we be
worried?
Thanks
Jan
On 11 Aug 2015, at 17:31, Dan van der Ster
suggestions on how to get Ceph to not start the ceph-osd processes on
host boot? It does not seem to be as simple as just disabling the service
Regards
Nathan
On 15/07/2015 7:15 PM, Jan Schermer wrote:
We have the same problems, we need to start the OSDs slowly.
The problem seems
You're not really testing only a RBD device there - you're testing
1) the O_DIRECT implementation in the kernel version you have (they differ)
- try different kernels in guest
2) cache implementation in qemu (and possibly virtio block driver) - if it's
enabled
- disable it for this test
Well, you could explicitly export HOME=/root then, that should make it go away.
I think it's normally only present in a login shell.
Jan
On 06 Aug 2015, at 17:51, Josh Durgin jdur...@redhat.com wrote:
On 08/06/2015 03:10 AM, Daleep Bais wrote:
Hi,
Whenever I restart or check the logs for
I think I wrote about my experience with this about 3 months ago, including
what techniques I used to minimize impact on production.
Basicaly we had to
1) increase pg_num in small increments only, bcreating the placement groups
themselves caused slowed requests on OSDs
2) increse pgp_num in
An interesting benchmark would be to compare Ceph SSD journal + ext4 on
spinner versus Ceph without journal + ext4 on spinner with external SSD
journal.
I won't be surprised if the second outperformed the first - you are actually
making the whole setup much simpler and Ceph is mostly CPU bound.
1 - 100 of 322 matches
Mail list logo