Re: [ceph-users] Multiple journals and an OSD on one SSD doable?

2015-06-08 Thread Jan Schermer
I recently did some testing of a few SSDs and found some surprising, and some not so surprising things: 1) performance varies wildly with firmware, especially with cheaper drives 2) performance varies with time - even with S3700 - slows down after ~40-80GB and then creeps back up 3) cheaper

Re: [ceph-users] Multiple journals and an OSD on one SSD doable?

2015-06-08 Thread Jan Schermer
On 08 Jun 2015, at 10:07, Christian Balzer ch...@gol.com wrote: On Mon, 8 Jun 2015 09:44:54 +0200 Jan Schermer wrote: I recently did some testing of a few SSDs and found some surprising, and some not so surprising things: 1) performance varies wildly with firmware, especially

Re: [ceph-users] Multiple journals and an OSD on one SSD doable?

2015-06-08 Thread Jan Schermer
On 08 Jun 2015, at 10:40, Christian Balzer ch...@gol.com wrote: On Mon, 8 Jun 2015 10:12:02 +0200 Jan Schermer wrote: On 08 Jun 2015, at 10:07, Christian Balzer ch...@gol.com wrote: On Mon, 8 Jun 2015 09:44:54 +0200 Jan Schermer wrote: I recently did some testing of a few SSDs

Re: [ceph-users] rbd cache + libvirt

2015-06-08 Thread Jan Schermer
Isn’t the right parameter “network=writeback” for network devices like RBD? Jan On 08 Jun 2015, at 12:31, Andrey Korolyov and...@xdel.ru wrote: On Mon, Jun 8, 2015 at 1:24 PM, Arnaud Virlet avir...@easter-eggs.com wrote: Hi Actually we use libvirt VM with ceph rbd pool for storage. By

Re: [ceph-users] Synchronous writes - tuning and some thoughts about them?

2015-06-03 Thread Jan Schermer
Durgin jdur...@redhat.com wrote: On 06/01/2015 03:41 AM, Jan Schermer wrote: Thanks, that’s it exactly. But I think that’s really too much work for now, that’s why I really would like to see a quick-win by using the local RBD cache for now - that would suffice for most workloads (not too many

Re: [ceph-users] apply/commit latency

2015-06-03 Thread Jan Schermer
You say “even when the cluster is doing nothing” - Are you seeing those numbers on a completely idle cluster? Even SSDs can go to sleep, as can CPUs (throttle/sleep states), memory gets swapped/paged out, tcp connections die, cache is empty... measuring a completely idle cluster is not always

Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Jan Schermer
).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866327 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 On 02 Jun 2015, at 20:14, Jan Schermer j...@schermer.cz wrote: Our mons just went into a logging frenzy. We have 3 mons in the cluster

Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Jan Schermer
Dumpling ceph-0.67.9-16.g69a99e6 I guess it shouldn’t be logging it at all? Thanks Jan On 02 Jun 2015, at 20:42, Somnath Roy somnath@sandisk.com wrote: Which code base are you using ? -Original Message- From: Jan Schermer [mailto:j...@schermer.cz] Sent: Tuesday, June 02

Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Jan Schermer
peon recovering peon active (and this is the madness) It logs much less now, but the issue is still here… Jan On 02 Jun 2015, at 20:17, Jan Schermer j...@schermer.cz wrote: Actually looks like it stopped, but here’s a more representative sample (notice how often it logged this!) v0 lc

[ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Jan Schermer
Our mons just went into a logging frenzy. We have 3 mons in the cluster, and they mostly log stuff like this 2015-06-02 18:00:48.749386 7f1c08c0d700 1 mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) is_readable now=2015-06-02 18:00:48.749389 lease_expire=2015-06-02

Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Jan Schermer
nothing to worry about other than log spam here which is fixed in the latest build or you can fix it with debug mon = 0/0 Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan Schermer Sent: Tuesday, June 02

Re: [ceph-users] calculating maximum number of disk and node failure that can be handled by cluster with out data loss

2015-06-10 Thread Jan Schermer
, 2015 at 8:47 AM, Jan Schermer j...@schermer.cz wrote: Hidden danger in the default CRUSH rules is that if you lose 3 drives in 3 different hosts at the same time, you _will_ lose data, and not just some data but possibly a piece of every rbd volume you have... And the probability

Re: [ceph-users] calculating maximum number of disk and node failure that can be handled by cluster with out data loss

2015-06-10 Thread Jan Schermer
, _every_ RBD device will have random holes in their data, like swiss cheese. BTW PGs can have stuck IOs without losing all three replicas -- see min_size. Cheers, Dan On Wed, Jun 10, 2015 at 10:20 AM, Jan Schermer j...@schermer.cz wrote: When you increase the number of OSDs, you generaly would

Re: [ceph-users] calculating maximum number of disk and node failure that can be handled by cluster with out data loss

2015-06-10 Thread Jan Schermer
Hidden danger in the default CRUSH rules is that if you lose 3 drives in 3 different hosts at the same time, you _will_ lose data, and not just some data but possibly a piece of every rbd volume you have... And the probability of that happening is sadly nowhere near zero. We had drives drop out

Re: [ceph-users] IO scheduler osd_disk_thread_ioprio_class

2015-06-23 Thread Jan Schermer
awful performance for some time (not those 8K you see though). I think there was some kind of firmware process involved, I had to replace the drive with a serious DC one. El 23/06/15 a las 14:07, Jan Schermer escribió: Yes, but that’s a separate issue :-) Some drives are just slow (100 IOPS

Re: [ceph-users] IO scheduler osd_disk_thread_ioprio_class

2015-06-23 Thread Jan Schermer
instead of cfq/deadline? I'm kind of surprised that you want/need to do any IO scheduling when your journal and FileStore are on a good SSD. Cheers, Dan On Tue, Jun 23, 2015 at 4:57 PM, Jan Schermer j...@schermer.cz wrote: For future generations I persuaded CFQ to play nice in the end

Re: [ceph-users] IO scheduler osd_disk_thread_ioprio_class

2015-06-23 Thread Jan Schermer
of firmware process involved, I had to replace the drive with a serious DC one. El 23/06/15 a las 14:07, Jan Schermer escribió: Yes, but that’s a separate issue :-) Some drives are just slow (100 IOPS) for synchronous writes with no other load. The drives I’m testing have ~8K IOPS when

Re: [ceph-users] IO scheduler osd_disk_thread_ioprio_class

2015-06-23 Thread Jan Schermer
...@vanderster.com wrote: On Tue, Jun 23, 2015 at 1:37 PM, Jan Schermer j...@schermer.cz wrote: Yes, I use the same drive one partition for journal other for xfs with filestore I am seeing slow requests when backfills are occuring - backfills hit the filestore but slow requests are (most

Re: [ceph-users] IO scheduler osd_disk_thread_ioprio_class

2015-06-23 Thread Jan Schermer
On Tue, Jun 23, 2015 at 1:54 PM, Jan Schermer j...@schermer.cz wrote: I only use SSDs, which is why I’m so surprised at the CFQ behaviour - the drive can sustain tens of thousand of reads per second, thousands of writes - yet saturating it with reads drops the writes to 10 IOPS - that’s mind

Re: [ceph-users] Explanation for ceph osd set nodown and ceph osd cluster_snap

2015-06-22 Thread Jan Schermer
Thanks. Nobody else knows anything about “cluster_snap”? It is mentioned in the docs, but that’s all… Jan On 19 Jun 2015, at 12:49, Carsten Schmitt carsten.schm...@uni-hamburg.de wrote: Hi Jan, On 06/18/2015 12:48 AM, Jan Schermer wrote: 1) Flags available in ceph osd set

Re: [ceph-users] Unexpected disk write activity with btrfs OSDs

2015-06-22 Thread Jan Schermer
I don’t run Ceph on btrfs, but isn’t this related to the btrfs snapshotting feature ceph uses to ensure a consistent journal? Jan On 19 Jun 2015, at 14:26, Lionel Bouton lionel+c...@bouton.name mailto:lionel+c...@bouton.name wrote: On 06/19/15 13:42, Burkhard Linke wrote: Forget the

Re: [ceph-users] Switching from tcmalloc

2015-06-25 Thread Jan Schermer
is working well, but far more consistent and likely faster than glibc. Mark On 06/24/2015 12:59 PM, Jan Schermer wrote: We already had the migratepages in place before we disabled tcmalloc. It didn’t do much. Disabling tcmalloc made immediate difference but there were still spikes

[ceph-users] Switching from tcmalloc

2015-06-24 Thread Jan Schermer
Can you guess when we did that? Still on dumpling, btw... http://www.zviratko.net/link/notcmalloc.png http://www.zviratko.net/link/notcmalloc.png Jan___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] IO scheduler osd_disk_thread_ioprio_class

2015-06-23 Thread Jan Schermer
Thank you for your reply, answers below. On 23 Jun 2015, at 13:15, Christian Balzer ch...@gol.com wrote: Hello, On Tue, 23 Jun 2015 12:53:45 +0200 Jan Schermer wrote: I use CFQ but I have just discovered it completely _kills_ writes when also reading (doing backfill for example

Re: [ceph-users] IO scheduler osd_disk_thread_ioprio_class

2015-06-23 Thread Jan Schermer
osd recovery op priority = 1 Hope that helps, Dan On Tue, Jun 23, 2015 at 12:53 PM, Jan Schermer j...@schermer.cz wrote: I use CFQ but I have just discovered it completely _kills_ writes when also reading (doing backfill for example) If I run a fio job for synchronous writes

[ceph-users] IO scheduler osd_disk_thread_ioprio_class

2015-06-23 Thread Jan Schermer
I use CFQ but I have just discovered it completely _kills_ writes when also reading (doing backfill for example) If I run a fio job for synchronous writes and at the same time run a fio job for random reads, writes drop to 10 IOPS (oops!). Setting io priority with ionice works nicely

Re: [ceph-users] Switching from tcmalloc

2015-06-24 Thread Jan Schermer
...@lists.ceph.com] On Behalf Of Jan Schermer Sent: Wednesday, June 24, 2015 10:54 AM To: Ben Hines Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Switching from tcmalloc We did, but I don’t have the numbers. I have lots of graphs, though. We were mainly trying to solve the CPU usage, since our

Re: [ceph-users] Unexpected issues with simulated 'rack' outage

2015-06-24 Thread Jan Schermer
Just a quick guess - is it possible you ran out of file descriptors/connections on the nodes or on a firewall on the way? I’ve seen this behaviour the other way around - when too many RBD devices were connected to one client. It would explain why it seems to work but hangs when the device is

Re: [ceph-users] Switching from tcmalloc

2015-06-24 Thread Jan Schermer
if my systems are using 80% cpu, if Ceph performance is better than when it's using 20% cpu. Can you share any scripts you have to automate these things? (NUMA pinning, migratepages) thanks, -Ben On Wed, Jun 24, 2015 at 10:25 AM, Jan Schermer j...@schermer.cz wrote: There were

Re: [ceph-users] Switching from tcmalloc

2015-06-24 Thread Jan Schermer
reducing locking for memory allocations. I would do some more testing along with what Ben Hines mentioned about overall client performance. - Robert LeBlanc GPG Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Wed, Jun 24, 2015 at 11:25 AM, Jan Schermer

Re: [ceph-users] Switching from tcmalloc

2015-06-24 Thread Jan Schermer
using tcmalloc? I've noticed that there is usually a good drop for us just by restarting them. I don't think it is usually this drastic. - Robert LeBlanc GPG Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Wed, Jun 24, 2015 at 2:08 AM, Jan Schermer wrote

[ceph-users] Restarting OSD leads to lower CPU usage

2015-06-11 Thread Jan Schermer
Hi, hoping someone can point me in the right direction. Some of my OSDs have a larger CPU usage (and ops latencies) than others. If I restart the OSD everything runs nicely for some time, then it creeps up. 1) most of my OSDs have ~40% CPU (core) usage (user+sys), some are closer to 80%.

Re: [ceph-users] Restarting OSD leads to lower CPU usage

2015-06-11 Thread Jan Schermer
On 11 Jun 2015, at 11:53, Henrik Korkuc li...@kirneh.eu wrote: On 6/11/15 12:21, Jan Schermer wrote: Hi, hoping someone can point me in the right direction. Some of my OSDs have a larger CPU usage (and ops latencies) than others. If I restart the OSD everything runs nicely for some

Re: [ceph-users] Restarting OSD leads to lower CPU usage

2015-06-11 Thread Jan Schermer
but I don't see it will be that significant. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan Schermer Sent: Thursday, June 11, 2015 5:57 AM To: Dan van der Ster Cc: ceph-users@lists.ceph.com Subject: Re

Re: [ceph-users] Restarting OSD leads to lower CPU usage

2015-06-11 Thread Jan Schermer
. Thanks Regards Somnath -Original Message- From: Jan Schermer [mailto:j...@schermer.cz] Sent: Thursday, June 11, 2015 12:10 PM To: Somnath Roy Cc: Dan van der Ster; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Restarting OSD leads to lower CPU usage Hi, I looked

Re: [ceph-users] Restarting OSD leads to lower CPU usage

2015-06-11 Thread Jan Schermer
, Jun 11, 2015 at 11:21 AM, Jan Schermer j...@schermer.cz wrote: Hi, hoping someone can point me in the right direction. Some of my OSDs have a larger CPU usage (and ops latencies) than others. If I restart the OSD everything runs nicely for some time, then it creeps up. 1) most of my OSDs

[ceph-users] Explanation for ceph osd set nodown and ceph osd cluster_snap

2015-06-17 Thread Jan Schermer
1) Flags available in ceph osd set are pause|noup|nodown|noout|noin|nobackfill|norecover|noscrub|nodeep-scrub|notieragent I know or can guess most of them (the docs are a “bit” lacking) But with ceph osd set nodown” I have no idea what it should be used for - to keep hammering a faulty OSD?

Re: [ceph-users] Hardware cache settings recomendation

2015-06-18 Thread Jan Schermer
, with SATA drives ST1000LM014-1EJ1 for data and for journal SSD INTEL SSDSC2BW12. Regards, Mateusz From: Jan Schermer [mailto:j...@schermer.cz] Sent: Wednesday, June 17, 2015 9:41 AM To: Mateusz Skała Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Hardware cache settings

Re: [ceph-users] Hardware cache settings recomendation

2015-06-17 Thread Jan Schermer
Cache on top of the data drives (not journal) will not help in most cases, those writes are already buffered in the OS - so unless your OS is very light on memory and flushing constantly it will have no effect, it just adds overhead in case a flush comes. I haven’t tested this extensively with

Re: [ceph-users] removed_snaps in ceph osd dump?

2015-06-16 Thread Jan Schermer
On 16 Jun 2015, at 12:59, Gregory Farnum g...@gregs42.com wrote: On Tue, Jun 16, 2015 at 11:53 AM, Jan Schermer j...@schermer.cz wrote: Well, I see mons dropping out when deleting large amount of snapshots, and it leats a _lot_ of CPU to delete them Well, you're getting past my

Re: [ceph-users] removed_snaps in ceph osd dump?

2015-06-16 Thread Jan Schermer
Ping :-) Looks like nobody is bothered by this, can I assume it is normal, doesn’t hurt anything and will grow to millions in time? Jan On 15 Jun 2015, at 10:32, Jan Schermer j...@schermer.cz wrote: Hi, I have ~1800 removed_snaps listed in the output of “ceph osd dump

Re: [ceph-users] removed_snaps in ceph osd dump?

2015-06-16 Thread Jan Schermer
On Tue, Jun 16, 2015 at 1:41 AM, Jan Schermer j...@schermer.cz wrote: Ping :-) Looks like nobody is bothered by this, can I assume it is normal, doesn’t hurt anything and will grow to millions in time? Jan On 15 Jun 2015, at 10:32, Jan Schermer j...@schermer.cz wrote: Hi, I have ~1800

Re: [ceph-users] CephFS: 'ls -alR' performance terrible unless Linux cache flushed

2015-06-16 Thread Jan Schermer
Have you tried just running “sync;sync” on the originating node? Does that achieve the same thing or not? (I guess it could/should). Jan On 16 Jun 2015, at 13:37, negillen negillen negil...@gmail.com wrote: Thanks again, even 'du' performance is terrible on node B (testing on a

[ceph-users] removed_snaps in ceph osd dump?

2015-06-15 Thread Jan Schermer
Hi, I have ~1800 removed_snaps listed in the output of “ceph osd dump”. Is that allright? Any way to get rid of those? What’s the significance? Thanks Jan ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] Discuss: New default recovery config settings

2015-06-01 Thread Jan Schermer
go up you can’t go down. Jan On 01 Jun 2015, at 10:57, huang jun hjwsm1...@gmail.com wrote: hi,jan 2015-06-01 15:43 GMT+08:00 Jan Schermer j...@schermer.cz: We had to disable deep scrub or the cluster would me unusable - we need to turn it back on sooner or later, though. With minimal

Re: [ceph-users] Synchronous writes - tuning and some thoughts about them?

2015-06-01 Thread Jan Schermer
Thanks, that’s it exactly. But I think that’s really too much work for now, that’s why I really would like to see a quick-win by using the local RBD cache for now - that would suffice for most workloads (not too many people run big databases on CEPH now, those who do must be aware of this).

Re: [ceph-users] Discuss: New default recovery config settings

2015-06-01 Thread Jan Schermer
We had to disable deep scrub or the cluster would me unusable - we need to turn it back on sooner or later, though. With minimal scrubbing and recovery settings, everything is mostly good. Turned out many issues we had were due to too few PGs - once we increased them from 4K to 16K everything

Re: [ceph-users] Synchronous writes - tuning and some thoughts about them?

2015-05-27 Thread Jan Schermer
Hi Nick, responses inline, again ;-) Thanks Jan On 27 May 2015, at 12:29, Nick Fisk n...@fisk.me.uk wrote: Hi Jan, Responses inline below -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan Schermer Sent: 25 May 2015 21:14

Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-05-28 Thread Jan Schermer
Can you check the capacitor reading on the S3700 with smartctl ? This drive has non-volatile cache which *should* get flushed when power is lost, depending on what hardware does on reboot it might get flushed even when rebooting. I just got this drive for testing yesterday and it’s a beast, but

Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-05-28 Thread Jan Schermer
On 28 May 2015, at 10:56, Christian Balzer ch...@gol.com wrote: On Thu, 28 May 2015 10:32:18 +0200 Jan Schermer wrote: Can you check the capacitor reading on the S3700 with smartctl ? I suppose you mean this? --- 175 Power_Loss_Cap_Test 0x0033 100 100 010Pre-fail

Re: [ceph-users] PG size distribution

2015-06-02 Thread Jan Schermer
Post the output from your “ceph osd tree”. We were in a similiar situation, some of the OSDs were quite full while other had 50% free. This is exactly why we increased the number of PGs, and it helped to some degree. Are all your hosts the same size? Does your CRUSH map select a host in the end?

Re: [ceph-users] How to use cgroup to bind ceph-osd to a specific cpu core?

2015-06-30 Thread Jan Schermer
AM, Jan Schermer j...@schermer.cz mailto:j...@schermer.cz wrote: I promised you all our scripts for automatic cgroup assignment - they are in our production already and I just need to put them on github, stay tuned tomorrow :-) Jan On 29 Jun 2015, at 19:41, Somnath Roy somnath

Re: [ceph-users] any recommendation of using EnhanceIO?

2015-07-02 Thread Jan Schermer
I think I posted my experience here ~1 month ago. My advice for EnhanceIO: don’t use it. But you didn’t exactly say what you want to cache - do you want to cache the OSD filestore disks? RBD devices on hosts? RBD devices inside guests? Jan On 02 Jul 2015, at 11:29, Emmanuel Florac

Re: [ceph-users] xattrs vs omap

2015-07-02 Thread Jan Schermer
Does anyone have a known-good set of parameters for ext4? I want to try it as well but I’m a bit worried what happnes if I get it wrong. Thanks Jan On 02 Jul 2015, at 09:40, Nick Fisk n...@fisk.me.uk wrote: -Original Message- From: ceph-users

Re: [ceph-users] any recommendation of using EnhanceIO?

2015-07-02 Thread Jan Schermer
And those disks are spindles? Looks like there’s simply too few of there…. Jan On 02 Jul 2015, at 13:49, German Anders gand...@despegar.com wrote: output from iostat: CEPHOSD01: Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await r_await

Re: [ceph-users] How to use cgroup to bind ceph-osd to a specific cpu core?

2015-06-29 Thread Jan Schermer
I promised you all our scripts for automatic cgroup assignment - they are in our production already and I just need to put them on github, stay tuned tomorrow :-) Jan On 29 Jun 2015, at 19:41, Somnath Roy somnath@sandisk.com wrote: Presently, you have to do it by using tool like

[ceph-users] Degraded in the negative?

2015-07-02 Thread Jan Schermer
Interesting. Any idea why degraded could be negative? :) 2015-07-02 17:27:11.551959 mon.0 [INF] pgmap v23198138: 36032 pgs: 35468 active+clean, 551 active+recovery_wait, 13 active+recovering; 13005 GB data, 48944 GB used, 21716 GB / 70660 GB avail; 11159KB/s rd, 129MB/s wr, 5059op/s;

Re: [ceph-users] OSD crashes

2015-07-03 Thread Jan Schermer
What’s the value of /proc/sys/vm/min_free_kbytes on your system? Increase it to 256M (better do it if there’s lots of free memory) and see if it helps. It can also be set too high, hard to find any formula how to set it correctly... Jan On 03 Jul 2015, at 10:16, Alex Gorbachev

Re: [ceph-users] How to use cgroup to bind ceph-osd to a specific cpu core?

2015-07-01 Thread Jan Schermer
. Best Regards -- Ray On Tue, Jun 30, 2015 at 11:50 PM, Jan Schermer j...@schermer.cz mailto:j...@schermer.cz wrote: Hi all, our script is available on GitHub https://github.com/prozeta/pincpus https://github.com/prozeta/pincpus I haven’t had much time to do a proper README, but I

[ceph-users] Synchronous writes - tuning and some thoughts about them?

2015-05-25 Thread Jan Schermer
Hi, I have a full-ssd cluster on my hands, currently running Dumpling, with plans to upgrade soon, and Openstack with RBD on top of that. While I am overall quite happy with the performance (scales well accross clients), there is one area where it really fails bad - big database workloads.

Re: [ceph-users] Synchronous writes - tuning and some thoughts about them?

2015-05-25 Thread Jan Schermer
be at the top of anyone's priority's. Nick -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan Schermer Sent: 25 May 2015 09:59 To: ceph-users@lists.ceph.com Subject: [ceph-users] Synchronous writes - tuning and some thoughts about them

Re: [ceph-users] Performance and CPU load on HP servers running ceph (DL380 G6, should apply to others too)

2015-05-26 Thread Jan Schermer
Turbo Boost will not hurt performance. Unless you have 100% load on all cores it will actually improve performance (vastly, in terms of bursty workloads). The issue you have could be related to CPU cores going to sleep mode. Put intel_idle.max_cstate=3” on the kernel command line (I ran with =2

Re: [ceph-users] Performance and CPU load on HP servers running ceph (DL380 G6, should apply to others too)

2015-05-26 Thread Jan Schermer
45DD A904 C70E E654 3BB2 FA62 B9F1 On Tue, May 26, 2015 at 5:53 AM, Lionel Bouton lionel+c...@bouton.name wrote: On 05/26/15 10:06, Jan Schermer wrote: Turbo Boost will not hurt performance. Unless you have 100% load on all cores it will actually improve performance (vastly, in terms

Re: [ceph-users] How does Ceph isolate bad blocks?

2015-08-03 Thread Jan Schermer
This is handled by the filesystem usually (or not, depending on what filesystem you use). When you hit a bad block you should just replace the drive - in case of a spinning disk the damage is likely going to spread, in case of flash device this error should have been prevented by firmware in

Re: [ceph-users] CEPH cache layer. Very slow

2015-08-13 Thread Jan Schermer
I tested and can recommend the Samsung 845 DC PRO (make sure it is DC PRO and not just PRO or DC EVO!). Those were very cheap but are out of stock at the moment (here). Faster than Intels, cheaper, and slightly different technology (3D V-NAND) which IMO makes them superior without needing many

Re: [ceph-users] How to improve single thread sequential reads?

2015-08-18 Thread Jan Schermer
I'm not sure if I missed that but are you testing in a VM backed by RBD device, or using the device directly? I don't see how blk-mq would help if it's not a VM, it just passes the request to the underlying block device, and in case of RBD there is no real block device from the host

Re: [ceph-users] any recommendation of using EnhanceIO?

2015-08-18 Thread Jan Schermer
I already evaluated EnhanceIO in combination with CentOS 6 (and backported 3.10 and 4.0 kernel-lt if I remember correctly). It worked fine during benchmarks and stress tests, but once we run DB2 on it it panicked within minutes and took all the data with it (almost literally - files that werent

Re: [ceph-users] Rename Ceph cluster

2015-08-18 Thread Jan Schermer
to a configuration set. The name doesn't actually appear *in* the configuration files. It stands to reason you should be able to rename the configuration files on the client side and leave the cluster alone. It'd be with trying in a test environment anyway. -Erik On Aug 18, 2015 7:59 AM, Jan Schermer

Re: [ceph-users] How to improve single thread sequential reads?

2015-08-18 Thread Jan Schermer
Reply in text On 18 Aug 2015, at 12:59, Nick Fisk n...@fisk.me.uk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan Schermer Sent: 18 August 2015 11:50 To: Benedikt Fraunhofer given.to.lists.ceph- users.ceph.com.toasta

Re: [ceph-users] How to improve single thread sequential reads?

2015-08-18 Thread Jan Schermer
On 18 Aug 2015, at 13:58, Nick Fisk n...@fisk.me.uk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan Schermer Sent: 18 August 2015 12:41 To: Nick Fisk n...@fisk.me.uk Cc: ceph-users@lists.ceph.com Subject: Re: [ceph

Re: [ceph-users] any recommendation of using EnhanceIO?

2015-08-18 Thread Jan Schermer
Yes, writeback mode. I didn't try anything else. Jan On 18 Aug 2015, at 18:30, Alex Gorbachev a...@iss-integration.com wrote: HI Jan, On Tue, Aug 18, 2015 at 5:00 AM, Jan Schermer j...@schermer.cz wrote: I already evaluated EnhanceIO in combination with CentOS 6 (and backported 3.10

Re: [ceph-users] Latency impact on RBD performance

2015-08-19 Thread Jan Schermer
This simply depends on what your workload is. I know this is a non-anwer for you but that's how it is. Databases are the worst, because they tend to hit the disks with every transaction, and the transaction throughput is in direct proportion to the number of IOPS you can get. And the number of

Re: [ceph-users] Bad performances in recovery

2015-08-21 Thread Jan Schermer
Thanks for the config, few comments inline:, not really related to the issue On 21 Aug 2015, at 15:12, J-P Methot jpmet...@gtcomm.net wrote: Hi, First of all, we are sure that the return to the default configuration fixed it. As soon as we restarted only one of the ceph nodes with the

Re: [ceph-users] Slow responding OSDs are not OUTed and cause RBD client IO hangs

2015-08-24 Thread Jan Schermer
I never actually set up iSCSI with VMware, I just had to research various VMware storage options when we had a SAN-probelm at a former job... But I can take a look at it again if you want me to. Is it realy deadlocked when this issue occurs? What I think is partly responsible for this

Re: [ceph-users] ceph osd debug question / proposal

2015-08-20 Thread Jan Schermer
Just to clarify - you unmounted the filesystem with umount -l? That almost never a good idea, and it puts the OSD in a very unusual situation where IO will actually work on the open files, but it can't open any new ones. I think this would be enough to confuse just about any piece of software.

Re: [ceph-users] ceph cluster_network with linklocal ipv6

2015-08-18 Thread Jan Schermer
16:02, Jan Schermer wrote: Shouldn't this: cluster_network = fe80::%cephnet/64 be this: cluster_network = fe80::/64 ? That won't work since the kernel doesn't know the scope. So %devname is right, but Ceph can't parse it. Although it sounds cool to run Ceph over link-local I don't

Re: [ceph-users] any recommendation of using EnhanceIO?

2015-08-18 Thread Jan Schermer
On 18 Aug 2015, at 16:44, Nick Fisk n...@fisk.me.uk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark Nelson Sent: 18 August 2015 14:51 To: Nick Fisk n...@fisk.me.uk; 'Jan Schermer' j...@schermer.cz Cc: ceph-users

Re: [ceph-users] ceph cluster_network with linklocal ipv6

2015-08-18 Thread Jan Schermer
On 18 Aug 2015, at 17:57, Björn Lässig b.laes...@pengutronix.de wrote: On 08/18/2015 04:32 PM, Jan Schermer wrote: Should ceph care about what scope the address is in? We don't specify it for ipv4 anyway, or is link-scope special in some way? fe80::/64 is on every ipv6 enabled interface

Re: [ceph-users] Slow responding OSDs are not OUTed and cause RBD client IO hangs

2015-08-24 Thread Jan Schermer
This can be tuned in the iSCSI initiation on VMware - look in advanced settings on your ESX hosts (at least if you use the software initiator). Jan On 23 Aug 2015, at 21:28, Nick Fisk n...@fisk.me.uk wrote: Hi Alex, Currently RBD+LIO+ESX is broken. The problem is caused by the RBD

Re: [ceph-users] ceph osd debug question / proposal

2015-08-24 Thread Jan Schermer
/2015 07:31 PM, Jan Schermer wrote: Just to clarify - you unmounted the filesystem with umount -l? That almost never a good idea, and it puts the OSD in a very unusual situation where IO will actually work on the open files, but it can't open any new ones. I think this would be enough to confuse

Re: [ceph-users] Bad performances in recovery

2015-08-20 Thread Jan Schermer
Are you sure it was because of configuration changes? Maybe it was restarting the OSDs that fixed it? We often hit an issue with backfill_toofull where the recovery/backfill processes get stuck until we restart the daemons (sometimes setting recovery_max_active helps as well). It still shows

Re: [ceph-users] How to use cgroup to bind ceph-osd to a specific cpu core?

2015-06-30 Thread Jan Schermer
to a NUMA node. Let me know how it works for you! Jan On 30 Jun 2015, at 10:50, Huang Zhiteng winsto...@gmail.com wrote: On Tue, Jun 30, 2015 at 4:25 PM, Jan Schermer j...@schermer.cz mailto:j...@schermer.cz wrote: Not having OSDs and KVMs compete against each other is one thing

Re: [ceph-users] Unexpected disk write activity with btrfs OSDs

2015-06-30 Thread Jan Schermer
I don’t run Ceph on btrfs, but isn’t this related to the btrfs snapshotting feature ceph uses to ensure a consistent journal? Jan On 19 Jun 2015, at 14:26, Lionel Bouton lionel+c...@bouton.name wrote: On 06/19/15 13:42, Burkhard Linke wrote: Forget the reply to the list...

Re: [ceph-users] dropping old distros: el6, precise 12.04, debian wheezy?

2015-07-30 Thread Jan Schermer
Not at all. We have this: http://ceph.com/docs/master/releases/ I would expect that whatever distribution I install Ceph LTS release on will be supported for the time specified. That means if I install Hammer on CentOS 6 now it will stay supported until 3Q/2016. Of course if in the meantime the

Re: [ceph-users] dropping old distros: el6, precise 12.04, debian wheezy?

2015-07-30 Thread Jan Schermer
I understand your reasons, but dropping support for LTS release like this is not right. You should lege artis support every distribution the LTS release could have ever been installed on - that’s what the LTS label is for and what we rely on once we build a project on top of it CentOS 6 in

Re: [ceph-users] dropping old distros: el6, precise 12.04, debian wheezy?

2015-07-30 Thread Jan Schermer
It is possible I misunderstood Sage’s message - I apologize if that’s the case. This is what made me uncertain: - We would probably continue building hammer and firefly packages for future bugfix point releases. Decision for new releases (Infernalis, Jewel, K*) regarding distro support should

Re: [ceph-users] OSD startup causing slow requests - one tip from me

2015-07-31 Thread Jan Schermer
On 31 Jul 2015, at 17:28, Haomai Wang haomaiw...@gmail.com wrote: On Fri, Jul 31, 2015 at 5:47 PM, Jan Schermer j...@schermer.cz wrote: I know a few other people here were battling with the occasional issue of OSD being extremely slow when starting. I personally run OSDs mixed with KVM

Re: [ceph-users] Check networking first?

2015-07-31 Thread Jan Schermer
I remember reading that ScaleIO (I think?) does something like this by regularly sending reports to a multicast group, thus any node with issues (or just overload) is reweighted or avoided automatically on the client. OSD map is the Ceph equivalent I guess. It makes sense to gather metrics and

Re: [ceph-users] Happy SysAdmin Day!

2015-07-31 Thread Jan Schermer
May your bytes stay with you :) Happy bofhday! Jan On 01 Aug 2015, at 00:10, Michael Kuriger mk7...@yp.com wrote: Thanks Mark you too Michael Kuriger Sr. Unix Systems Engineer * mk7...@yp.com |( 818-649-7235 On 7/31/15, 3:02 PM, ceph-users on behalf of Mark Nelson

Re: [ceph-users] Is it safe to increase pg number in a production environment

2015-08-05 Thread Jan Schermer
Hi, comments inline. On 05 Aug 2015, at 05:45, Jevon Qiao qiaojianf...@unitedstack.com wrote: Hi Jan, Thank you for the detailed suggestion. Please see my reply in-line. On 5/8/15 01:23, Jan Schermer wrote: I think I wrote about my experience with this about 3 months ago, including

Re: [ceph-users] Is it safe to increase pg numbers in a production environment

2015-08-05 Thread Jan Schermer
Hi, is adjusting crush weight really a good solution for this? Crush weight out of the box corresponds to OSD capacity in TB and this looks like a good “weight” to me. The issue is not in a bucket having wrong weight, but somewhere else depending on CRUSH. We actually use “osd reweight” for

Re: [ceph-users] Setting up a proper mirror system for Ceph

2015-08-05 Thread Jan Schermer
I remember when ceph.com was down a while ago - it hurts. Thank you for this. Cloudflare works and should be free for the website itself. Not sure how they handle caching of “larger” (not website) objects for repositories etc, might be plug and play or might require integration with their CDN.

[ceph-users] OSD startup causing slow requests - one tip from me

2015-07-31 Thread Jan Schermer
I know a few other people here were battling with the occasional issue of OSD being extremely slow when starting. I personally run OSDs mixed with KVM guests on the same nodes, and was baffled by this issue occuring mostly on the most idle (empty) machines. Thought it was some kind of race

Re: [ceph-users] Ceph allocator and performance

2015-08-11 Thread Jan Schermer
Hi, if you look in the archive you'll see I posted something similiar about 2 months ago. You can try something experimenting with 1) stock binaries - tcmalloc 2) LD_PRELOADed jemalloc 3) ceph recompiled with neither (glibc malloc) 4) ceph recompiled with jemalloc (?) We simply recompiled ceph

Re: [ceph-users] Is it safe to increase pg number in a production environment

2015-08-11 Thread Jan Schermer
Could someone clarify what the impact of this bug is? We did increase pg_num/pgp_num and we are on dumpling (0.67.12 unofficial snapshot). Most of our clients are likely restarted already, but not all. Should we be worried? Thanks Jan On 11 Aug 2015, at 17:31, Dan van der Ster

Re: [ceph-users] Slow requests during ceph osd boot

2015-08-07 Thread Jan Schermer
suggestions on how to get Ceph to not start the ceph-osd processes on host boot? It does not seem to be as simple as just disabling the service Regards Nathan On 15/07/2015 7:15 PM, Jan Schermer wrote: We have the same problems, we need to start the OSDs slowly. The problem seems

Re: [ceph-users] Direct IO tests on RBD device vary significantly

2015-08-07 Thread Jan Schermer
You're not really testing only a RBD device there - you're testing 1) the O_DIRECT implementation in the kernel version you have (they differ) - try different kernels in guest 2) cache implementation in qemu (and possibly virtio block driver) - if it's enabled - disable it for this test

Re: [ceph-users] Warning regarding LTTng while checking status or restarting service

2015-08-07 Thread Jan Schermer
Well, you could explicitly export HOME=/root then, that should make it go away. I think it's normally only present in a login shell. Jan On 06 Aug 2015, at 17:51, Josh Durgin jdur...@redhat.com wrote: On 08/06/2015 03:10 AM, Daleep Bais wrote: Hi, Whenever I restart or check the logs for

Re: [ceph-users] Is it safe to increase pg number in a production environment

2015-08-04 Thread Jan Schermer
I think I wrote about my experience with this about 3 months ago, including what techniques I used to minimize impact on production. Basicaly we had to 1) increase pg_num in small increments only, bcreating the placement groups themselves caused slowed requests on OSDs 2) increse pgp_num in

Re: [ceph-users] НА: Different filesystems on OSD hosts at the samecluster

2015-08-07 Thread Jan Schermer
An interesting benchmark would be to compare Ceph SSD journal + ext4 on spinner versus Ceph without journal + ext4 on spinner with external SSD journal. I won't be surprised if the second outperformed the first - you are actually making the whole setup much simpler and Ceph is mostly CPU bound.

  1   2   3   4   >