Re: [ceph-users] osd crash and high server load - ceph-osd crashes with stacktrace

2015-10-25 Thread Jacek Jarosiewicz
We've upgraded ceph to 0.94.4 and kernel to 3.16.0-51-generic but the problem still persists. Lately we see these crashes on a daily basis. I'm leaning toward the conclusion that this is a software problem - this hardware ran stable before and we're seeing all four nodes crash randomly with

Re: [ceph-users] Question about hardware and CPU selection

2015-10-25 Thread Christian Balzer
Hello, There are of course a number of threads in the ML archives about things like this. On Sat, 24 Oct 2015 17:48:35 +0200 Mike Miller wrote: > Hi, > > as I am planning to set up a ceph cluster with 6 OSD nodes with 10 > harddisks in each node, could you please give me some advice about >

Re: [ceph-users] [performance] rbd kernel module versus qemu librbd

2015-10-25 Thread hzwuli...@gmail.com
Hi, New information, i think the poor performance is due to too many threads of qume-system-x86 process. For normal case, it just use about 200 threads. For abnormal case, it will use about 400 threads or 700 threads, and the performance is: 200 threads > 400 threads > 700 threads Now, i

[ceph-users] PG won't stay clean

2015-10-25 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 I have a 0.94.4 cluster that when I repair/deep-scrub a PG, it comes back clean, but as soon as I restart any OSD that hosts it, it goes back to inconsistent. If I deep-scrub that PG it clears up. I determined that the bad copy was not on the

Re: [ceph-users] 2-Node Cluster - possible scenario?

2015-10-25 Thread Alan Johnson
Quorum can be achieved with one monitor node (for testing purposes this would be OK, but of course it is a single point of failure) however the default for the OSD nodes is three way replication (can be changed) but easier to set up three OSD nodes to start with and one monitor node. For your

Re: [ceph-users] PG won't stay clean

2015-10-25 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 I set debug_osd = 20/20 and restarted the primary osd. The logs are at http://162.144.87.113/files/ceph-osd.110.log.xz . The PG in question is 9.e3 and it is one of 15 that have this same behavior. The cluster is currently idle. -

Re: [ceph-users] when an osd is started up, IO will be blocked

2015-10-25 Thread wangsongbo
Hi all, When an osd is started, I will get a lot of slow requests from the corresponding osd log, as follows: 2015-10-26 03:42:51.593961 osd.4 [WRN] slow request 3.967808 seconds old, received at 2015-10-26 03:42:47.625968: osd_repop(client.2682003.0:2686048 43.fcf

[ceph-users] randwrite iops of rbd volume in kvm decrease after several hours with qemu threads and cpu usage on host increasing

2015-10-25 Thread Jackie
Hi experts, When I test io performance of rbd volume in pure ssd pool with fio in kvm vm, the iops decreased from 15k to 5k, while nums of qemu threads on host increased from about 200 to about 700, cpu usage of qemu process on host increased from 600% to 1400%. My testing scene is as

Re: [ceph-users] 2-Node Cluster - possible scenario?

2015-10-25 Thread Christian Balzer
Hello, On Sun, 25 Oct 2015 16:17:02 +0100 Hermann Himmelbauer wrote: > Hi, > In a little project of mine I plan to start ceph storage with a small > setup and to be able to scale it up later. Perhaps someone can give me > any advice if the following (two nodes with OSDs, third node with >

Re: [ceph-users] when an osd is started up, IO will be blocked

2015-10-25 Thread wangsongbo
Hi all, When an osd is started, I will get a lot of slow requests from the corresponding osd log, as follows: 2015-10-26 03:42:51.593961 osd.4 [WRN] slow request 3.967808 seconds old, received at 2015-10-26 03:42:47.625968: osd_repop(client.2682003.0:2686048 43.fcf

[ceph-users] locked up cluster while recovering OSD

2015-10-25 Thread Ludovico Cavedon
Hi, we have a Ceph cluster with: - 12 OSDs on 6 physical nodes, 64 GB RAM - each OSD has a 6 TB spinning disk and a 10GB journal in ram (tmpfs) [1] - 3 redundant copies - 25% space usage so far - ceph 0.94.2. - store data via radosgw, using sharded bucket indexes (64 shards). - 500 PGs per node

[ceph-users] 2-Node Cluster - possible scenario?

2015-10-25 Thread Hermann Himmelbauer
Hi, In a little project of mine I plan to start ceph storage with a small setup and to be able to scale it up later. Perhaps someone can give me any advice if the following (two nodes with OSDs, third node with Monitor only): - 2 Nodes (enough RAM + CPU), 6*3TB Harddisk for OSDs -> 9TB usable