Re: [ceph-users] mds crashing

2015-05-19 Thread Markus Blank-Burian
I am afraid, I hit the same bug. Giant worked fine, but after upgrading to hammer (0.94.1) and putting some load on it, the MDSs eventually crashed and now I am stuck in clientreplay most of the time. I am also using the cephfs kernel client (3.18.y). As I didn't find a corresponding tracker

Re: [ceph-users] mds crashing

2015-05-19 Thread Markus Blank-Burian
Here are some logs and the infos from the mdsc files. But I am afraid that there might not be much info in the logs, since I had a very low log level. Look for example at 2015-05-18T21:28:33+02:00. The mdsc files are concatenated from all of the clients. Date: Tue, 19 May 2015 16:45:12

Re: [ceph-users] mds crashing

2015-05-19 Thread Yan, Zheng
On Tue, May 19, 2015 at 4:31 PM, Markus Blank-Burian bur...@muenster.de wrote: I am afraid, I hit the same bug. Giant worked fine, but after upgrading to hammer (0.94.1) and putting some load on it, the MDSs eventually crashed and now I am stuck in clientreplay most of the time. I am also using

Re: [ceph-users] mds crashing

2015-05-19 Thread flisky
On 2015年05月19日 17:07, Markus Blank-Burian wrote: Here are some logs and the infos from the mdsc files. But I am afraid that there might not be much info in the logs, since I had a very low log level. Look for example at2015-05-18T21:28:33+02:00. The mdsc files are concatenated from all of the

Re: [ceph-users] mds crashing

2015-05-19 Thread Yan, Zheng
could you try the attached patch On Tue, May 19, 2015 at 5:10 PM, Markus Blank-Burian bur...@muenster.de wrote: Forgot the attachments. Besides, is there any way to get the cluster running again without restarting all client nodes? On Tue, May 19, 2015 at 10:45 AM, Yan, Zheng

[ceph-users] OSD crashing over and over, taking cluster down

2015-05-19 Thread Daniel Schneller
Last night our Hammer cluster suffered a series of OSD crashes on all cluster nodes. We were running Hammer (0.94.1-98-g7df3eb5, built because we had a major problem a week ago which we suspected to be related to bugs we found in the tracker, that were not yet in 0.94.1). In the meantime we

Re: [ceph-users] QEMU Venom Vulnerability

2015-05-19 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 I've downloaded the new tarball, placed it in rpmbuild/SOURCES then with the extracted spec file in rpmbuild/SPEC, I update it to the new version and then rpmbuild -ba program.spec. If you install the SRPM then it will install the RH patches that

Re: [ceph-users] OSD unable to start (giant - hammer)

2015-05-19 Thread Berant Lemmenes
Sam, It is for a valid pool, however the up and acting sets for 2.14 both show OSDs 8 7. I'll take a look at 7 8 and see if they are good. If so, it seems like it being present on osd.3 could be an artifact from previous topologies and I could mv it off old.3 Thanks very much for the

[ceph-users] replication over slow uplink

2015-05-19 Thread John Peebles
Hi, I'm hoping for advice on whether Ceph could be used in an atypical use case. Specifically, I have about ~20TB of files that need replicated to 2 different sites. Each site has its own internal gigabit ethernet network. However, the connection between the sites is only ~320kbits. I'm trying to

Re: [ceph-users] mds crashing

2015-05-19 Thread Markus Blank-Burian
I actually managed to reboot everything today and it ran smoothly for the last few minutes. MDS failover also worked without problems. If anything bad happens in the next days, I will let you know. Markus On Tue, May 19, 2015 at 1:12 PM, Markus Blank-Burian bur...@muenster.de wrote: Thanks for

[ceph-users] How to improve latencies and per-VM performance and latencies

2015-05-19 Thread Межов Игорь Александрович
Hi! Seeking performance improvement in our cluster (Firefly 0.80.7 on Wheezy, 5 nodes, 58 osds), I wrote a small python script, that walks through ceph nodes and issue 'perf dump' command on osd admin sockets. It extracts *_latency tuples, calculate min/max/avg, compare osd perf metrics with

[ceph-users] [Calamari] Build Calamari for Centos 7 nodes

2015-05-19 Thread Ignacio Bravo
All, I have looked at the various guides in ceph.com related to building and deploying Calamari, and they will build the RPMs based on CentOS 6 or Red Hat 7, but not CentOS 7. I have no problem with the OS of the Calamari server, as I am thinking in creating a VM for this, and Ubuntu or

[ceph-users] Snap operation throttling (again)

2015-05-19 Thread Andrey Korolyov
Hello, this question was brought many times before, and also solved in a various ways - snap trimmer, scheduler` priorities and persistent fix (for a ReplicatedPG issue), but it seems that the current Ceph versions may suffer as well during the rollback operations on large images and on large

Re: [ceph-users] OSD crashing over and over, taking cluster down

2015-05-19 Thread Samuel Just
You appear to be using pool snapshots with radosgw, I suspect that's what is causing the issue. Can you post a longer log? Preferably with debug osd = 20 debug filestore = 20 debug ms = 1 from startup to crash on an osd? -Sam - Original Message - From: Daniel Schneller

Re: [ceph-users] QEMU Venom Vulnerability

2015-05-19 Thread Georgios Dimitrakakis
I am trying to build the packages manually and I was wondering is the flag --enable-rbd enough to have full Ceph functionality? Does anybody know what else flags should I include in order to have the same functionality as the original CentOS package plus the RBD support? Regards, George On

Re: [ceph-users] client.radosgw.gateway for 2 radosgw servers

2015-05-19 Thread Michael Kuriger
I have 3 GW servers, but they are defined like this: [client.radosgw.ceph-gw1] rgw_ops_log_data_backlog = 4096 rgw_enable_ops_log = true keyring = /etc/ceph/ceph.client.radosgw.keyring rgw_print_continue = true rgw_ops_log_rados = true host = ceph-gw1 rgw_frontends = civetweb port=80

Re: [ceph-users] OSD unable to start (giant - hammer)

2015-05-19 Thread Samuel Just
If 2.14 is part of a non-existent pool, you should be able to rename it out of current/ in the osd directory to prevent the osd from seeing it on startup. -Sam - Original Message - From: Berant Lemmenes ber...@lemmenes.com To: Samuel Just sj...@redhat.com Cc: ceph-users@lists.ceph.com

Re: [ceph-users] QEMU Venom Vulnerability

2015-05-19 Thread Georgios Dimitrakakis
Erik, are you talking about the ones here : http://ftp.redhat.com/redhat/linux/enterprise/6Server/en/RHEV/SRPMS/ ??? From what I see the version is rather small 0.12.1.2-2.448 How one can verify that it has been patched against venom vulnerability? Additionally I only see the qemu-kvm

Re: [ceph-users] QEMU Venom Vulnerability

2015-05-19 Thread Georgios Dimitrakakis
Erik, thanks for the feedback. I am still on 6 so if someone else has a proposal please come forward... Best, George Sorry, I made the assumption you were on 7. If youre on 6 then I defer to someone else ;) If youre on 7, go here.

Re: [ceph-users] OSD unable to start (giant - hammer)

2015-05-19 Thread Berant Lemmenes
Hello, So here are the steps I performed and where I sit now. Step 1) Using 'ceph-objectstore-tool list' to create a list of all PGs not associated with the 3 pools (rbd, data, metadata) that are actually in use on this cluster. Step 2) I then did a 'ceph-objectstore-tool remove' of those PGs

Re: [ceph-users] QEMU Venom Vulnerability

2015-05-19 Thread Erik McCormick
Sorry, I made the assumption you were on 7. If you're on 6 then I defer to someone else ;) If you're on 7, go here. http://ftp.redhat.com/pub/redhat/linux/enterprise/7Server/en/RHEV/SRPMS/ On May 19, 2015 2:47 PM, Georgios Dimitrakakis gior...@acmac.uoc.gr wrote: Erik, are you talking about

[ceph-users] radosgw performance with small files

2015-05-19 Thread Srikanth Madugundi
Hi, I am seeing write performance hit with small files (60K) using radosgw. The radosgw is configured to run with 600 threads. Here is the write speed I get with file sizes of 60K # sudo ceph -s cluster e445e46e-4d84-4606-9923-16fff64446dc health HEALTH_OK monmap e1: 1 mons at

[ceph-users] fix active+clean+inconsistent on cephfs when digest != digest

2015-05-19 Thread core
Hi list, I was struggeling quiet a while with the problem that on my cephfs data pool some PG’s stays inconsistent and could not be repaired. The message in OSD’s log was like repair 11.23a 57b4363a/2015b67.06e1/head//11 on disk data digest 0x325d0322 != 0xe8c0243 and then the

Re: [ceph-users] Cache Pool Flush/Eviction Limits - Hard of Soft?

2015-05-19 Thread Nick Fisk
Been doing some more digging. I'm getting messages in the OSD logs like these, don't know if these are normal or a clue to something not right 2015-05-19 18:36:27.664698 7f58b91dd700 0 log_channel(cluster) log [WRN] : slow request 30.346117 seconds old, received at 2015-05-19 18:35:57.318208:

Re: [ceph-users] QEMU Venom Vulnerability

2015-05-19 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 You should be able to get the SRPM, extract the SPEC file and use that to build a new package. You should be able to tweak all the compile options as well. I'm still really new to building/rebuilding RPMs but I've been able to do this for a couple