Re: [ceph-users] certificate of `ceph.com' is not trusted!

2015-02-13 Thread Dietmar Maurer
I think the root-CA (COMODO RSA Certification Authority) is not available on your Linux host? Using Google chrome connecting to https://ceph.com/ works fine. No, its a wget bug. I now switched to LWP::UserAgent and it works perfectly. ___ ceph-users

[ceph-users] certificate of `ceph.com' is not trusted!

2015-02-12 Thread Dietmar Maurer
I get the following error on standard Debian Wheezy # wget https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc --2015-02-13 07:19:04-- https://ceph.com/git/?p=ceph.git Resolving ceph.com (ceph.com)... 208.113.241.137, 2607:f298:4:147::b05:fe2a Connecting to ceph.com

Re: [ceph-users] Placement groups stuck inactive after down out of 1/9 OSDs

2014-12-19 Thread Dietmar Maurer
The more I think about this problem, the less I think there'll be an easy answer, and it's more likely that I'll have to reproduce the scenario and actually pause myself next time in order to troubleshoot it? It is even possible to simulate those crush problem. I reported a few examples long

[ceph-users] howto limit snaphot rollback priority

2014-08-13 Thread Dietmar Maurer
Hi all, I just noticed that a snapshot rollback produces very high load on small clusters. Seems all OSDs copies data at full speed, and client access speed drops from 480MB/s to 10MB/s. Is there a way to limit rollback speed/priority? ___

Re: [ceph-users] Does CEPH rely on any multicasting?

2014-05-16 Thread Dietmar Maurer
Ceph has nothing to do with a HA cluster based on pacemaker. It has a complete different logic built in. The only similarity is that both use a quorum algorithm to detect split brain situations. I talk about cluster services like 'corosync', which provide membership and quorum services. For

Re: [ceph-users] Does CEPH rely on any multicasting?

2014-05-15 Thread Dietmar Maurer
Does CEPH rely on any multicasting? Appreciate the feedback.. Nope! All networking is point-to-point. Besides, it would be great if ceph could use existing cluster stacks like corosync, ... Is there any plan to support that? ___ ceph-users

Re: [ceph-users] slow requests from rados bench with small writes

2014-02-16 Thread Dietmar Maurer
Some projects manually modify PRUNEPATHS in the init script, for example: http://git.openvz.org/?p=vzctl;a=commitdiff;h=47334979b9b5340f84d84639b2d77a8a1f0bb7cf It sounds like what is needed here is for the deb and rpm packages to add /var/lib/ceph to the PRUNEPATHS in /etc/updatedb.conf.

Re: [ceph-users] unable to start OSD

2014-02-12 Thread Dietmar Maurer
I am unable to start my OSDs on one node: osd/PGLog.cc: 672: FAILED assert(last_e.version.version e.version.version) Does that mean there is something wrong with my journal disk? Or why can such thing happen? After rebooting other nodes, all my OSD are offline, showing exactly the same

Re: [ceph-users] unable to start OSD

2014-02-12 Thread Dietmar Maurer
After enabling debugging, I get: ... -4 2014-02-12 09:43:44.739648 7f7f8b848780 20 read_log 6100'1677 (6100'1676) modify 85949a17/rbd_data.dd6592ae8944a.01bd/head//25 by clie nt.890681.0:76884 2014-01-26 16:44:08.412457 -3 2014-02-12 09:43:44.739670 7f7f8b848780 20 read_log

Re: [ceph-users] unable to start OSD

2014-02-12 Thread Dietmar Maurer
This sounds like a bug introduced an entry into the pg log that is not ordered properly. I don't think I've seen this before... Sam, have you? How many OSDs you do you have? 12 OSDs, 3 nodes Can you set 'debug osd = 20' in your ceph.conf, restart and reproduce the crash, The log I sent

Re: [ceph-users] unable to start OSD

2014-02-12 Thread Dietmar Maurer
OK, I upload the log to our server: ftp://download.proxmox.com/tmp/ceph-osd.4.log -Original Message- From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users- boun...@lists.ceph.com] On Behalf Of Dietmar Maurer Sent: Mittwoch, 12. Februar 2014 18:41 To: Sage Weil Cc: ceph-users

Re: [ceph-users] unable to start OSD

2014-02-12 Thread Dietmar Maurer
It would be great to get two logs from two different crashing OSDs for comparison purposes. ftp://download.proxmox.com/tmp/ceph-osd.4.log ftp://download.proxmox.com/tmp/ceph-osd.10.log and post the log somewhere? (You can use 'ceph-post filename' to send it to us # ceph-post-file

Re: [ceph-users] unable to start OSD

2014-02-12 Thread Dietmar Maurer
It would be great to get two logs from two different crashing OSDs for comparison purposes. ftp://download.proxmox.com/tmp/ceph-osd.4.log ftp://download.proxmox.com/tmp/ceph-osd.10.log I guess I should also mention that there was a miss-configuration in the network MTU setting of one

Re: [ceph-users] pg is stuck unclean since forever

2014-02-11 Thread Dietmar Maurer
do you have changed/check the crush-map ? The crush map is OK (not changed). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] unable to start OSD

2014-02-11 Thread Dietmar Maurer
I am unable to start my OSDs on one node: osd/PGLog.cc: 672: FAILED assert(last_e.version.version e.version.version) Does that mean there is something wrong with my journal disk? Or why can such thing happen? Here is the OSD log: ... 2014-02-12 07:04:39.376993 7f8236afe780 0 cls

Re: [ceph-users] pg is stuck unclean since forever

2014-02-10 Thread Dietmar Maurer
On my test cluster, some PGs are stuck unclean forever (pool 24, size=2). Directory  /var/lib/ceph/osd/ceph-X/current/24.126_head/ is empty on all OSDs. Any idea what is wrong? And how can I recover from that state? The interesting thing is that all OSDs are up, and those PGs does not list

Re: [ceph-users] Proxmox VE Ceph Server released (beta)

2014-01-24 Thread Dietmar Maurer
I think it could be great to add some osd statistics (io/s,...), I think it's possible through ceph api. You see IO/s on the log. I also added Latency stats for OSDs rescently. Also maybe an email alerting system if an osd state change (up/down/) yes, and SMART,

Re: [ceph-users] 3 node setup with pools size=3

2014-01-14 Thread Dietmar Maurer
When using a pool size of 3, I get the following behavior when one OSD fails: * the affected PGs get marked active+degraded * there is no data movement/backfill Works as designed, if you have the default crush map in place (all replicas must be on DIFFERENT hosts). You need to

Re: [ceph-users] 3 node setup with pools size=3

2014-01-14 Thread Dietmar Maurer
Sorry, it seems as if I had misread your question: Only a single OSD fails, not the whole server? Yes, only a single OSD is down and marked out. Then there should definitively be a backfilling in place. no, this does not happen. Many PGs stay in degraded state (I tested this several

Re: [ceph-users] 3 node setup with pools size=3

2014-01-14 Thread Dietmar Maurer
Are you aware of this? http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/ = Stopping w/out Rebalancing What do you think is wrong with my setup? I want to re-balance. The problem is that it does not happen at all! I do exactly the same test with and without 'ceph osd

Re: [ceph-users] 3 node setup with pools size=3

2014-01-14 Thread Dietmar Maurer
-users- boun...@lists.ceph.com] On Behalf Of Dietmar Maurer Sent: Dienstag, 14. Jänner 2014 10:40 To: Wolfgang Hennerbichler; ceph-users@lists.ceph.com Subject: Re: [ceph-users] 3 node setup with pools size=3 Are you aware of this? http://ceph.com/docs/master/rados/troubleshooting

[ceph-users] crushtool question

2014-01-14 Thread Dietmar Maurer
Seems that marking an OSD as 'out' has other effects than removing an OSD from crush map. I guess weights are not changed if the OSD is marked out? So how can I test that with crushtool? ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] further crush map questions

2014-01-14 Thread Dietmar Maurer
We observe strange behavior with some configurations. PGs stays in degraded state after a single OSD failure. I can also show the behavior using crushtool with the following map: --crush map- # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0

[ceph-users] 3 node setup with pools size=3

2014-01-13 Thread Dietmar Maurer
I am still playing around with a small setup using 3 Nodes, each running 4 OSDs (=12 OSDs). When using a pool size of 3, I get the following behavior when one OSD fails: * the affected PGs get marked active+degraded * there is no data movement/backfill Note: using 'ceph osd crush tunables

[ceph-users] crush choose firstn vs. indep

2014-01-12 Thread Dietmar Maurer
From the docs: step [choose|chooseleaf] [firstn|indep] N bucket-type What exactly is the difference between 'firstn' and 'indep'? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] crush choose firstn vs. indep

2014-01-12 Thread Dietmar Maurer
For Ceph releases up to Emperor[1], firstn is used and I'm not aware of a use case requiring indep. As part of the effort to implement erasure coded pools, firstn[2] and indep[3] were separated in two functions. The firstn method is best suited for replicated pools. The indep method tries to

Re: [ceph-users] crush chooseleaf vs. choose

2014-01-06 Thread Dietmar Maurer
the following distribution: device 0: 423 device 1: 453 device 2: 430 device 3: 455 device 4: 657 device 5: 654 Host with only one osd gets too much data. On Fri, 3 Jan 2014, Dietmar Maurer wrote: In both cases, you only get 2 replicas on the remaining 2 hosts

Re: [ceph-users] crush chooseleaf vs. choose

2014-01-06 Thread Dietmar Maurer
Host with only one osd gets too much data. I think this is just fundamentally a problem with distributing 3 replicas over only 4 hosts. Every piece of data in the system needs to include either host 3 or 4 (and thus device 4 or 5) in order to have 3 replicas (on separate hosts). Add

Re: [ceph-users] crush chooseleaf vs. choose

2014-01-06 Thread Dietmar Maurer
I think this is just fundamentally a problem with distributing 3 replicas over only 4 hosts. Every piece of data in the system needs to include either host 3 or 4 (and thus device 4 or 5) in order to have 3 replicas (on separate hosts). Add more hosts or disks and the distribution will

[ceph-users] crush chooseleaf vs. choose

2014-01-02 Thread Dietmar Maurer
I try to understand the default crush rule: rule data { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } Is this the same as: rule data { ruleset 0 type

Re: [ceph-users] rados benchmark question

2014-01-02 Thread Dietmar Maurer
Having your journals on the same disk causes all data to be written twice, i.e. once to the journal and once to the osd store.  Notice that your tested throughput is slightly more than half your expected maximum... But AFAIK OSD bench already considers journal writes. The disk can write

Re: [ceph-users] crush chooseleaf vs. choose

2014-01-02 Thread Dietmar Maurer
iirc, chooseleaf goes down the tree and descents into multiple leafs to find what you are looking for. choose goes into that leaf and tries to find what you are looking for without going into subtrees. Right. To a first approximation, these rules are equivalent. The difference is

Re: [ceph-users] crush chooseleaf vs. choose

2014-01-02 Thread Dietmar Maurer
The other difference is if you have one of the two OSDs on the host marked out. In the choose case, the remaining OSD will get allocated 2x the data; in the chooseleaf case, usage will remain proportional with the rest of the cluster and the data from the out OSD will be distributed across

Re: [ceph-users] rados benchmark question

2014-01-02 Thread Dietmar Maurer
-Original Message- From: Stefan Priebe [mailto:s.pri...@profihost.ag] Sent: Donnerstag, 02. Jänner 2014 18:36 To: Dietmar Maurer; Dino Yancey Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] rados benchmark question Hi, Am 02.01.2014 17:10, schrieb Dietmar Maurer

Re: [ceph-users] rados benchmark question

2014-01-02 Thread Dietmar Maurer
# iostat -x 5 (after about 30 seconds) Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0.00 3.800.00 187.40 0.00 84663.60 903.56 157.62 796.930.00 796.93 5.34 100.00

Re: [ceph-users] rados benchmark question

2014-01-02 Thread Dietmar Maurer
so your disks are completely utilized and can't keep up see %util and await. But it say it writes at 80MB/s, so that would be about 40MB/s for data? And 40*6=240 (not 190) Did you miss the replication factor? I think it should be: 40MB/s*6/3 = 80MB/s My test pool use size=1 (no

Re: [ceph-users] rados benchmark question

2014-01-02 Thread Dietmar Maurer
Did you miss the replication factor? I think it should be: 40MB/s*6/3 = 80MB/s My test pool use size=1 (no replication) ok out of ideas... ;-( sorry What values do you get? (osd bench vs. rados benchmar with pool size=1) ___ ceph-users

Re: [ceph-users] crush chooseleaf vs. choose

2014-01-02 Thread Dietmar Maurer
In both cases, you only get 2 replicas on the remaining 2 hosts. OK, I was able to reproduce this with crushtool. The difference is if you have 4 hosts with 2 osds. In the choose case, you have some fraction of the data that chose the down host in the first step (most of the attempts,

Re: [ceph-users] crush chooseleaf vs. choose

2014-01-02 Thread Dietmar Maurer
I also don't really understand why crush selects OSDs with weight=0 host prox-ceph-3 { id -4 # do not change unnecessarily # weight 3.630 alg straw hash 0 # rjenkins1 item osd.4 weight 0 } root default { id -1 # do not

[ceph-users] rados benchmark question

2014-01-01 Thread Dietmar Maurer
Hi all, I run 3 nodes connected with a 10Gbit network, each running 2 OSDs. Disks are 4TB Seagate Constellation ST4000NM0033-9ZM (xfs, journal on same disk). # ceph tell osd.0 bench { bytes_written: 1073741824, blocksize: 4194304, bytes_per_sec: 56494242.00} So a single OSD can write