Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)
On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote: I am trying to deploy a new ceph cluster and my monitors are not reaching quorum. SELinux is off, firewalls are off, I can see traffic between the nodes on port 6789 but when I use the admin socket to force a re-election only the monitor I send the request to shows the new election in its logs. My logs are filled entirely of the following two lines: 2015-06-02 11:31:56.447975 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch 2015-06-02 11:31:56.448272 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished You are running on default debug levels, so you'll hardly get anything more than that. I suggest setting 'debug mon = 10' and 'debug ms = 1' for added verbosity and come back to us with the logs. There are many reasons for this, but the more common are due to the monitors not being able to communicate with each other. Given you see traffic between the monitors, I'm inclined to assume that the other two monitors do not have each other on the monmap or, if they do know each other, either 1) the monitor's auth keys do not match, or 2) the probe timeout is being triggered before they successfully manage to find enough monitors to trigger an election -- which may be due to latency. Logs will tells us more. -Joao Querying the admin socket with mon_status (the other two are the similar but with their hostnames and rank): { name: wcm1, rank: 0, state: probing, election_epoch: 1, quorum: [], outside_quorum: [ wcm1 ], extra_probe_peers: [], sync_provider: [], monmap: { epoch: 0, fsid: adb8c500-122e-49fd-9c1e-a99af7832307, modified: 2015-06-02 10:43:41.467811, created: 2015-06-02 10:43:41.467811, mons: [ { rank: 0, name: wcm1, addr: 10.1.226.64:6789\/0 }, { rank: 1, name: wcm2, addr: 10.1.226.65:6789\/0 }, { rank: 2, name: wcm3, addr: 10.1.226.66:6789\/0 } ] } } ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] active+clean+scrubbing+deep
Hi! I have ceph version 0.94.1. root@ceph-node1:~# ceph -s cluster 3e0d58cd-d441-4d44-b49b-6cff08c20abf health HEALTH_OK monmap e2: 3 mons at {ceph-mon=10.10.100.3:6789/0,ceph-node1=10.10.100.1:6789/0,ceph-node2=10.10.100.2:6789/0} election epoch 428, quorum 0,1,2 ceph-node1,ceph-node2,ceph-mon osdmap e978: 16 osds: 16 up, 16 in pgmap v6735569: 2012 pgs, 8 pools, 2801 GB data, 703 kobjects 5617 GB used, 33399 GB / 39016 GB avail 2011 active+clean 1 active+clean+scrubbing+deep client io 174 kB/s rd, 30641 kB/s wr, 80 op/s root@ceph-node1:~# ceph pg dump | grep -i deep | cut -f 1 dumped all in format plain pg_stat 19.b3 In log file i see 2015-05-14 03:23:51.556876 7fc708a37700 0 log_channel(cluster) log [INF] : 19.b3 deep-scrub starts but no 19.b3 deep-scrub ok then i do ceph pg deep-scrub 19.b3, nothing happens and in logs file no any records about it. What can i do to pg return in active + clean station? is there any sense restart OSD or the entirely server where the OSD? Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph on RHEL7.0
Hi Ken Are these packages compatible with Giant or Hammer? We are currently running Hammer - can we use the RBD kernel module from RH7.1 and is the elrepo version of cephFS compatible with Hammer? Regards Paul On 01/06/2015 17:57, Ken Dreyer kdre...@redhat.com wrote: For the sake of providing more clarity regarding the Ceph kernel module situation on RHEL 7.0, I've removed all the files at https://github.com/ceph/ceph-kmod-rpm and updated the README there. The summary is that if you want to use Ceph's RBD kernel module on RHEL 7, you should use RHEL 7.1 or later. And if you want to use the kernel CephFS client on RHEL 7, you should use the latest upstream kernel packages from ELRepo. Hope that clarifies things from a RHEL 7 kernel perspective. - Ken On 05/28/2015 09:16 PM, Luke Kao wrote: Hi Bruce, RHEL7.0 kernel has many issues on filesystem sub modules and most of them fixed only in RHEL7.1. So you should consider to go to RHEL7.1 directly and upgrade to at least kernel 3.10.0-229.1.2 BR, Luke *From:* ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Bruce McFarland [bruce.mcfarl...@taec.toshiba.com] *Sent:* Friday, May 29, 2015 5:13 AM *To:* ceph-users@lists.ceph.com *Subject:* [ceph-users] Ceph on RHEL7.0 We¹re planning on moving from Centos6.5 to RHEL7.0 for Ceph storage and monitor nodes. Are there any known issues using RHEL7.0? Thanks This electronic message contains information from Mycom which may be privileged or confidential. The information is intended to be for the use of the individual(s) or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or any other use of the contents of this information is prohibited. If you have received this electronic message in error, please notify us by post or telephone (to the numbers or correspondence address above) or by email (at the email address above) immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SLES Packages
On 2015-06-01T15:41:05, Steffen Weißgerber weissgerb...@ksnb.de wrote: Hi, I'm searching for actual packages for SLES11 SP3. Via SMT-Updateserver it seems that there's only Version 0.80.8 available. Are there other package sources available (at least for Giant)? Hi Steffen, we have only released the client side enablement for SLES 11 SP3. There is no Ceph server side code available for this platform (at least not from SUSE). Our server-side offering is based on SLES 12 (SUSE Enterprise Storage). Currently based on firefly 0.80.9, though as always, the next upgrade is always in the works ;-) (Probably directly going to 0.80.11.) Only on our next product release will be based on Hammer++. A more community-oriented version, including more recent packages, is available for openSUSE (via build.opensuse.org). What I want to do is mount ceph via rbd map natively instead mounting nfs from another host on which I have actual packages available. That should be possible with the SLES 11 SP3 packages that you have access to. The rbd client code is included there. Regards, Lars -- Architect Storage/HA SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-mon logging like crazy because....?
Actually looks like it stopped, but here’s a more representative sample (notice how often it logged this!) v0 lc 36602135 2015-06-02 17:39:59.865833 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865860 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865886 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865944 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865989 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866025 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866027 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866072 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866074 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866121 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866123 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866164 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866166 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866205 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866207 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866244 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866246 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866285 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866287 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866325 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866327 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 On 02 Jun 2015, at 20:14, Jan Schermer j...@schermer.cz wrote: Our mons just went into a logging frenzy. We have 3 mons in the cluster, and they mostly log stuff like this 2015-06-02 18:00:48.749386 7f1c08c0d700 1 mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) is_readable now=2015-06-02 18:00:48.749389 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063 2015-06-02 18:00:49.025179 7f1c08c0d700 1 mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.025187 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063 2015-06-02 18:00:49.025640 7f1c08c0d700 1 mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.025642 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063 2015-06-02 18:00:49.026132 7f1c08c0d700 1 mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.026134 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063 2015-06-02 18:00:49.028388 7f1c08c0d700 1 mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.028393 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063 There are few lines every second, sometimes more, sometimes less (tell me if that’s normal. I’m not sure) Two of them went completely haywire, one log is 17GB now and rising. It’s still mostly the same content, just more frequent: 2015-06-02 18:09:00.879950 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.879956 lease_expire=0.00 has v0 lc 36604772 2015-06-02 18:09:00.879968 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.879969 lease_expire=0.00 has v0 lc 36604772 2015-06-02 18:09:00.954835 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.954843 lease_expire=0.00 has v0 lc 36604772 2015-06-02 18:09:00.954860 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos updating c
Re: [ceph-users] ceph-mon logging like crazy because....?
I think with the latest version of code it is printing only for log level 5, earlier it was 1. Here is the link where I had some conversation about this earlier with Sage. http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881 So, IMO nothing to worry about other than log spam here which is fixed in the latest build or you can fix it with debug mon = 0/0 Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan Schermer Sent: Tuesday, June 02, 2015 11:33 AM To: ceph-users Subject: Re: [ceph-users] ceph-mon logging like crazy because? Another follow-up. The whole madness started with “mon compact” which we run from cron (else leveldb eats all space). It’s been running for about 14 days now with no incident. 2015-06-02 16:40:01.624804 7f4309d45700 0 mon.node-14@2(peon) e3 handle_command mon_command({prefix: compact} v 0) v1 2015-06-02 16:40:23.646514 7f430a746700 1 mon.node-14@2(peon).paxos(paxos updating c 36596805..36597321) lease_timeout -- calling new election 2015-06-02 16:40:23.646947 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646947 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646953 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646954 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646960 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646961 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646963 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646964 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646968 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646969 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646971 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646972 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646976 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646977 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646979 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646980 lease_expire=0.00 has v0 lc 3659 7321 The sequence that follows is probing recovering electing recovering peon recovering peon active (and this is the madness) It logs much less now, but the issue is still here… Jan On 02 Jun 2015, at 20:17, Jan Schermer j...@schermer.cz wrote: Actually looks like it stopped, but here’s a more representative sample (notice how often it logged this!) v0 lc 36602135 2015-06-02 17:39:59.865833 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865860 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865886 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865944 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865989 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866025 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866027 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866072 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866074 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866121 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866123 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866164 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866166 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866205 7f4309d45700 1
Re: [ceph-users] ceph-mon logging like crazy because....?
Dumpling ceph-0.67.9-16.g69a99e6 I guess it shouldn’t be logging it at all? Thanks Jan On 02 Jun 2015, at 20:42, Somnath Roy somnath@sandisk.com wrote: Which code base are you using ? -Original Message- From: Jan Schermer [mailto:j...@schermer.cz] Sent: Tuesday, June 02, 2015 11:41 AM To: Somnath Roy Cc: ceph-users Subject: Re: [ceph-users] ceph-mon logging like crazy because? We actually have debug mon = 0” It was always spammy, but this is too spammy - on one mon the log size is 500MB since morning. on other node it’s 17GB and about 16.5GB of that is within one hour - something’s not right there and this is likely just a symptom… Jan On 02 Jun 2015, at 20:36, Somnath Roy somnath@sandisk.com wrote: I think with the latest version of code it is printing only for log level 5, earlier it was 1. Here is the link where I had some conversation about this earlier with Sage. http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881 So, IMO nothing to worry about other than log spam here which is fixed in the latest build or you can fix it with debug mon = 0/0 Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan Schermer Sent: Tuesday, June 02, 2015 11:33 AM To: ceph-users Subject: Re: [ceph-users] ceph-mon logging like crazy because? Another follow-up. The whole madness started with “mon compact” which we run from cron (else leveldb eats all space). It’s been running for about 14 days now with no incident. 2015-06-02 16:40:01.624804 7f4309d45700 0 mon.node-14@2(peon) e3 handle_command mon_command({prefix: compact} v 0) v1 2015-06-02 16:40:23.646514 7f430a746700 1 mon.node-14@2(peon).paxos(paxos updating c 36596805..36597321) lease_timeout -- calling new election 2015-06-02 16:40:23.646947 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646947 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646953 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646954 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646960 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646961 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646963 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646964 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646968 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646969 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646971 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646972 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646976 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646977 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646979 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646980 lease_expire=0.00 has v0 lc 3659 7321 The sequence that follows is probing recovering electing recovering peon recovering peon active (and this is the madness) It logs much less now, but the issue is still here… Jan On 02 Jun 2015, at 20:17, Jan Schermer j...@schermer.cz wrote: Actually looks like it stopped, but here’s a more representative sample (notice how often it logged this!) v0 lc 36602135 2015-06-02 17:39:59.865833 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865860 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865886 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865944 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865989 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866025
Re: [ceph-users] ceph-mon logging like crazy because....?
Another follow-up. The whole madness started with “mon compact” which we run from cron (else leveldb eats all space). It’s been running for about 14 days now with no incident. 2015-06-02 16:40:01.624804 7f4309d45700 0 mon.node-14@2(peon) e3 handle_command mon_command({prefix: compact} v 0) v1 2015-06-02 16:40:23.646514 7f430a746700 1 mon.node-14@2(peon).paxos(paxos updating c 36596805..36597321) lease_timeout -- calling new election 2015-06-02 16:40:23.646947 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646947 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646953 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646954 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646960 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646961 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646963 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646964 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646968 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646969 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646971 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646972 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646976 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646977 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646979 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646980 lease_expire=0.00 has v0 lc 3659 7321 The sequence that follows is probing recovering electing recovering peon recovering peon active (and this is the madness) It logs much less now, but the issue is still here… Jan On 02 Jun 2015, at 20:17, Jan Schermer j...@schermer.cz wrote: Actually looks like it stopped, but here’s a more representative sample (notice how often it logged this!) v0 lc 36602135 2015-06-02 17:39:59.865833 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865860 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865886 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865944 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865989 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866025 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866027 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866072 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866074 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866121 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866123 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866164 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866166 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866205 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866207 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866244 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866246 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866285 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866287 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866325 7f4309d45700 1
Re: [ceph-users] PG size distribution
Hello, Thank you for the feedback Jan, much appreciated! I wont post the whole tree as it is rather long, but here is an example of one of our hosts. All of the OSDs and hosts are weighted the same, with the exception of a host that is missing an OSD due to a broken backplane. We are only using hosts for buckets so no rack/DC. We have not manually adjusted the crush map at all for this cluster. -1 302.26959 root default -24 14.47998 host osd23 192 1.81000 osd.192 up 1.0 1.0 193 1.81000 osd.193 up 1.0 1.0 194 1.81000 osd.194 up 1.0 1.0 195 1.81000 osd.195 up 1.0 1.0 199 1.81000 osd.199 up 1.0 1.0 200 1.81000 osd.200 up 1.0 1.0 201 1.81000 osd.201 up 1.0 1.0 202 1.81000 osd.202 up 1.0 1.0 I appreciate your input and will likely follow the same path you have, slowly increasing the PGs and adjusting the weights as necessary. If anyone else has any further suggestions I'd love to hear them as well! - Daniel On 06/02/2015 01:33 PM, Jan Schermer wrote: Post the output from your “ceph osd tree”. We were in a similiar situation, some of the OSDs were quite full while other had 50% free. This is exactly why we increased the number of PGs, and it helped to some degree. Are all your hosts the same size? Does your CRUSH map select a host in the end? That way if you have few hosts with differing number of OSDs the distribution will be poor (IMHO). Anyway, when we started increasing the PG numbers we first generated the PGs themselves (pg_num) in small increments since that put a lot of load on the OSDs and we were seeing slow requests with large increases. So something like this: for i in `seq 4096 64 8192` ; do ceph osd pool set poolname pg_num $i ; done This ate a few gigs from the drives (1-2GB if I remember correctly). Once that was finished we increased the pgp_num in larger and larger increments - at first 64 at a time and then 512 at a time when we were reaching the target (16384 in our case). This does allocate more space temporarily, and it seems to just randomly move data around - one minute an OSD is fine, another and the OSD is nearing full. One of us basically had to watch the process all the time, reweighting the devices that were almost full. With increasing number of PGs it became much simpler, as the overhead was smaller, every bit of work was smaller and all the management operations a lot smoother. YMMV - our data distribution was poor from the start, hosts had differing weights due to differing number of OSDs, there were some historical remnants when we tried to load-balance the data by hand, and we ended in a much better state but not perfect - some OSDs still have much more free space than other. We haven’t touched the CRUSH map at all during this process, once we do and set newer tunables then the data distribution should be much more even. I’d love to hear the others’ input since we are not sure why exactly this problem is present at all - I’d expect it to fill all the OSDs to the same or close-enough level, but in reality we have OSDs with weight 1.0 which are almost empty and others with weight 0.5 which are nearly full… When adding data it seems to (subjectively) distribute them evenly... Jan On 02 Jun 2015, at 18:52, Daniel Maraio dmar...@choopa.com wrote: Hello, I have some questions about the size of my placement groups and how I can get a more even distribution. We currently have 160 2TB OSDs across 20 chassis. We have 133TB used in our radosgw pool with a replica size of 2. We want to move to 3 replicas but are concerned we may fill up some of our OSDs. Some OSDs have ~1.1TB free while others only have ~600GB free. The radosgw pool has 4096 pgs, looking at the documentation I probably want to increase this up to 8192, but we have decided to hold off on that for now. So, now for the pg usage. I dumped out the PG stats and noticed that there are two groups of PG sizes in my cluster. There are about 1024 PGs that are each around 17-18GB in size. The rest of the PGs are all around 34-36GB in size. Any idea why there are two distinct groups? We only have the one pool with data in it, though there are several different buckets in the radosgw pool. The data in the pool ranges from small images to 4-6mb audio files. Will increasing the number of PGs on this pool provide a more even distribution? Another thing to note is that the initial cluster was built lopsided, with some 4TB OSDs and some 2TB, we have removed all the 4TB disks and are only using 2TBs across the entire cluster. Not sure if this would have had any impact. Thank you for your time and I would appreciate any insight the community can offer. - Daniel
[ceph-users] ceph-mon logging like crazy because....?
Our mons just went into a logging frenzy. We have 3 mons in the cluster, and they mostly log stuff like this 2015-06-02 18:00:48.749386 7f1c08c0d700 1 mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) is_readable now=2015-06-02 18:00:48.749389 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063 2015-06-02 18:00:49.025179 7f1c08c0d700 1 mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.025187 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063 2015-06-02 18:00:49.025640 7f1c08c0d700 1 mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.025642 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063 2015-06-02 18:00:49.026132 7f1c08c0d700 1 mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.026134 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063 2015-06-02 18:00:49.028388 7f1c08c0d700 1 mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.028393 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063 There are few lines every second, sometimes more, sometimes less (tell me if that’s normal. I’m not sure) Two of them went completely haywire, one log is 17GB now and rising. It’s still mostly the same content, just more frequent: 2015-06-02 18:09:00.879950 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.879956 lease_expire=0.00 has v0 lc 36604772 2015-06-02 18:09:00.879968 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.879969 lease_expire=0.00 has v0 lc 36604772 2015-06-02 18:09:00.954835 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.954843 lease_expire=0.00 has v0 lc 36604772 2015-06-02 18:09:00.954860 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.954861 lease_expire=0.00 has v0 lc 36604772 2015-06-02 18:09:01.249648 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249668 lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773 2015-06-02 18:09:01.249697 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249699 lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773 2015-06-02 18:09:01.249708 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249709 lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773 2015-06-02 18:09:01.249736 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249736 lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773 Any idea what it might be? Clocks look synced, no other aparent problem that I can see, the cluster is working. I’d like to know why this happened before I restart the unhealthy mons which (I hope) will fix this. Thanks Jan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-mon logging like crazy because....?
Which code base are you using ? -Original Message- From: Jan Schermer [mailto:j...@schermer.cz] Sent: Tuesday, June 02, 2015 11:41 AM To: Somnath Roy Cc: ceph-users Subject: Re: [ceph-users] ceph-mon logging like crazy because? We actually have debug mon = 0” It was always spammy, but this is too spammy - on one mon the log size is 500MB since morning. on other node it’s 17GB and about 16.5GB of that is within one hour - something’s not right there and this is likely just a symptom… Jan On 02 Jun 2015, at 20:36, Somnath Roy somnath@sandisk.com wrote: I think with the latest version of code it is printing only for log level 5, earlier it was 1. Here is the link where I had some conversation about this earlier with Sage. http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881 So, IMO nothing to worry about other than log spam here which is fixed in the latest build or you can fix it with debug mon = 0/0 Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan Schermer Sent: Tuesday, June 02, 2015 11:33 AM To: ceph-users Subject: Re: [ceph-users] ceph-mon logging like crazy because? Another follow-up. The whole madness started with “mon compact” which we run from cron (else leveldb eats all space). It’s been running for about 14 days now with no incident. 2015-06-02 16:40:01.624804 7f4309d45700 0 mon.node-14@2(peon) e3 handle_command mon_command({prefix: compact} v 0) v1 2015-06-02 16:40:23.646514 7f430a746700 1 mon.node-14@2(peon).paxos(paxos updating c 36596805..36597321) lease_timeout -- calling new election 2015-06-02 16:40:23.646947 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646947 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646953 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646954 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646960 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646961 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646963 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646964 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646968 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646969 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646971 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646972 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646976 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646977 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646979 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646980 lease_expire=0.00 has v0 lc 3659 7321 The sequence that follows is probing recovering electing recovering peon recovering peon active (and this is the madness) It logs much less now, but the issue is still here… Jan On 02 Jun 2015, at 20:17, Jan Schermer j...@schermer.cz wrote: Actually looks like it stopped, but here’s a more representative sample (notice how often it logged this!) v0 lc 36602135 2015-06-02 17:39:59.865833 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865860 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865886 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865944 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865989 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866025 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866027 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135
[ceph-users] Error while installing ceph built from source
Hello, I am trying to deploy a ceph cluster by compile from sources and I get an error with these messages: dpkg: dependency problems prevent configuration of ceph: ceph depends on ceph-common (= 9.0.0-943); however: Version of ceph-common on system is 9.0.0-1. ceph-common (9.0.0-1) breaks ceph ( 9.0.0-943) and is unpacked but not configured. Version of ceph to be configured is 9.0.0-1. . . . . Errors were encountered while processing: ceph ceph-dbg ceph-mds ceph-mds-dbg ceph-resource-agents I followed the steps from the documentation to build Ceph packages from source: 1. ./autogen.sh 2. ./configure 3. make -j6 4. sudo dpkg-buildpackage Now I am trying to deploy using the same procedure mentioned in ceph-deploy (with the exception of the ceph-deploy install ceph step): 1. ceph-deploy new hostname 2. sudo dpkg -i * (in the folder containing the .deb files) At this step I get the error that is pasted above. I was able to follow the same procedure without any issues on 2 other machines but I am not able to identify the root cause. Any help from the community is appreciated! Thanks, Aakanksha ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-mon logging like crazy because....?
We actually have debug mon = 0” It was always spammy, but this is too spammy - on one mon the log size is 500MB since morning. on other node it’s 17GB and about 16.5GB of that is within one hour - something’s not right there and this is likely just a symptom… Jan On 02 Jun 2015, at 20:36, Somnath Roy somnath@sandisk.com wrote: I think with the latest version of code it is printing only for log level 5, earlier it was 1. Here is the link where I had some conversation about this earlier with Sage. http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881 So, IMO nothing to worry about other than log spam here which is fixed in the latest build or you can fix it with debug mon = 0/0 Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan Schermer Sent: Tuesday, June 02, 2015 11:33 AM To: ceph-users Subject: Re: [ceph-users] ceph-mon logging like crazy because? Another follow-up. The whole madness started with “mon compact” which we run from cron (else leveldb eats all space). It’s been running for about 14 days now with no incident. 2015-06-02 16:40:01.624804 7f4309d45700 0 mon.node-14@2(peon) e3 handle_command mon_command({prefix: compact} v 0) v1 2015-06-02 16:40:23.646514 7f430a746700 1 mon.node-14@2(peon).paxos(paxos updating c 36596805..36597321) lease_timeout -- calling new election 2015-06-02 16:40:23.646947 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646947 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646953 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646954 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646960 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646961 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646963 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646964 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646968 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646969 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646971 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646972 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646976 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646977 lease_expire=0.00 has v0 lc 3659 7321 2015-06-02 16:40:23.646979 7f4309d45700 1 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646980 lease_expire=0.00 has v0 lc 3659 7321 The sequence that follows is probing recovering electing recovering peon recovering peon active (and this is the madness) It logs much less now, but the issue is still here… Jan On 02 Jun 2015, at 20:17, Jan Schermer j...@schermer.cz wrote: Actually looks like it stopped, but here’s a more representative sample (notice how often it logged this!) v0 lc 36602135 2015-06-02 17:39:59.865833 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865860 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865886 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865944 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.865989 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866025 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866027 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866072 7f4309d45700 1 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866074 lease_expire=2015-06-02 17:40:04.221316 has v0 lc 36602135 2015-06-02 17:39:59.866121 7f4309d45700 1
Re: [ceph-users] Error while installing ceph built from source
You need to run ceph-deploy purge first..Even after that if you see those old packages are still there, you need to manually remove those before installation. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Aakanksha Pudipeddi-SSI Sent: Tuesday, June 02, 2015 12:45 PM To: ceph-users@lists.ceph.com Subject: [ceph-users] Error while installing ceph built from source Hello, I am trying to deploy a ceph cluster by compile from sources and I get an error with these messages: dpkg: dependency problems prevent configuration of ceph: ceph depends on ceph-common (= 9.0.0-943); however: Version of ceph-common on system is 9.0.0-1. ceph-common (9.0.0-1) breaks ceph ( 9.0.0-943) and is unpacked but not configured. Version of ceph to be configured is 9.0.0-1. . . . . Errors were encountered while processing: ceph ceph-dbg ceph-mds ceph-mds-dbg ceph-resource-agents I followed the steps from the documentation to build Ceph packages from source: 1. ./autogen.sh 2. ./configure 3. make -j6 4. sudo dpkg-buildpackage Now I am trying to deploy using the same procedure mentioned in ceph-deploy (with the exception of the ceph-deploy install ceph step): 1. ceph-deploy new hostname 2. sudo dpkg -i * (in the folder containing the .deb files) At this step I get the error that is pasted above. I was able to follow the same procedure without any issues on 2 other machines but I am not able to identify the root cause. Any help from the community is appreciated! Thanks, Aakanksha ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)
Thanks for the links, Jumbo frames are definitely working. Although we had to set the MTU to 8192 because one of the components doesn't support an MTU higher than that. Thanks for the help. Looks like we may just have to deal with jumbo frames being off. Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From: Somnath Roy somnath@sandisk.com To: cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com, ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis j...@suse.de Date: 03/06/2015 11:49 a.m. Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) I doubt it is anything to do with Ceph, hope you checked your switch is supporting Jumbo frames and you have set MTU 9000 to all the devices in between. It‘s better to ping your devices (all the devices participating in the cluster) like the way it mentioned in the following articles , just in case you are not sure. http://www.mylesgray.com/hardware/test-jumbo-frames-working/ http://serverfault.com/questions/234311/testing-whether-jumbo-frames-are-actually-working Hope this helps, Thanks Regards Somnath From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] Sent: Tuesday, June 02, 2015 4:32 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Setting the MTU to 1500 worked, monitors reach quorum right away. Unfortunately we really want Jumbo Frames to be on, any ideas on how to get ceph to work with them on? Thanks! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From:Somnath Roy somnath@sandisk.com To:cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz Cc:ceph-users@lists.ceph.com ceph-users@lists.ceph.com, ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis j...@suse.de Date:03/06/2015 10:34 a.m. Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) We have seen some communication issue with that, try to make all the server MTU 1500 and try out… From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] Sent: Tuesday, June 02, 2015 3:31 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) We are running with Jumbo Frames turned on. Is that likely to be the issue? Do I need to configure something in ceph? The mon maps are fine and after setting debug to 10 and debug ms to 1, I see probe timeouts in the logs: http://pastebin.com/44M1uJZc I just set probe timeout to 10 (up from 2) and it still times out. Thanks! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From:Somnath Roy somnath@sandisk.com To:Joao Eduardo Luis j...@suse.de, ceph-users@lists.ceph.com ceph-users@lists.ceph.com Date:03/06/2015 03:49 a.m. Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Sent by:ceph-users ceph-users-boun...@lists.ceph.com By any chance are you running with jumbo frame turned on ? Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joao Eduardo Luis Sent: Tuesday, June 02, 2015 12:52 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote: I am trying to deploy a new ceph cluster and my monitors are not reaching quorum. SELinux is off, firewalls are off, I can see traffic between the nodes on port 6789 but when I use the admin socket to force a re-election only the monitor I send the request to shows the new election in its logs. My logs are filled entirely of the following two lines: 2015-06-02 11:31:56.447975 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch 2015-06-02 11:31:56.448272 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket'
Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)
Setting the MTU to 1500 worked, monitors reach quorum right away. Unfortunately we really want Jumbo Frames to be on, any ideas on how to get ceph to work with them on? Thanks! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From: Somnath Roy somnath@sandisk.com To: cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com, ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis j...@suse.de Date: 03/06/2015 10:34 a.m. Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) We have seen some communication issue with that, try to make all the server MTU 1500 and try out… From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] Sent: Tuesday, June 02, 2015 3:31 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) We are running with Jumbo Frames turned on. Is that likely to be the issue? Do I need to configure something in ceph? The mon maps are fine and after setting debug to 10 and debug ms to 1, I see probe timeouts in the logs: http://pastebin.com/44M1uJZc I just set probe timeout to 10 (up from 2) and it still times out. Thanks! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From:Somnath Roy somnath@sandisk.com To:Joao Eduardo Luis j...@suse.de, ceph-users@lists.ceph.com ceph-users@lists.ceph.com Date:03/06/2015 03:49 a.m. Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Sent by:ceph-users ceph-users-boun...@lists.ceph.com By any chance are you running with jumbo frame turned on ? Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joao Eduardo Luis Sent: Tuesday, June 02, 2015 12:52 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote: I am trying to deploy a new ceph cluster and my monitors are not reaching quorum. SELinux is off, firewalls are off, I can see traffic between the nodes on port 6789 but when I use the admin socket to force a re-election only the monitor I send the request to shows the new election in its logs. My logs are filled entirely of the following two lines: 2015-06-02 11:31:56.447975 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch 2015-06-02 11:31:56.448272 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished You are running on default debug levels, so you'll hardly get anything more than that. I suggest setting 'debug mon = 10' and 'debug ms = 1' for added verbosity and come back to us with the logs. There are many reasons for this, but the more common are due to the monitors not being able to communicate with each other. Given you see traffic between the monitors, I'm inclined to assume that the other two monitors do not have each other on the monmap or, if they do know each other, either 1) the monitor's auth keys do not match, or 2) the probe timeout is being triggered before they successfully manage to find enough monitors to trigger an election -- which may be due to latency. Logs will tells us more. -Joao Querying the admin socket with mon_status (the other two are the similar but with their hostnames and rank): { name: wcm1, rank: 0, state: probing, election_epoch: 1, quorum: [], outside_quorum: [ wcm1 ], extra_probe_peers: [], sync_provider: [], monmap: { epoch: 0, fsid: adb8c500-122e-49fd-9c1e-a99af7832307, modified: 2015-06-02 10:43:41.467811, created: 2015-06-02 10:43:41.467811, mons: [ { rank: 0, name: wcm1, addr: 10.1.226.64:6789\/0 }, { rank: 1, name: wcm2, addr: 10.1.226.65:6789\/0 }, { rank: 2, name: wcm3, addr: 10.1.226.66:6789\/0 } ] } } ___ ceph-users mailing list
Re: [ceph-users] Synchronous writes - tuning and some thoughts about them?
On 06/01/2015 03:41 AM, Jan Schermer wrote: Thanks, that’s it exactly. But I think that’s really too much work for now, that’s why I really would like to see a quick-win by using the local RBD cache for now - that would suffice for most workloads (not too many people run big databases on CEPH now, those who do must be aware of this). The issue is - and I have not yet seen an answer to that - would it be safe as it is now if the flushes were ignored (rbd cache = unsafe) or will it completely b0rk the filesystem when not flushed properly? Generally the latter. Right now flushes are the only thing enforcing ordering for rbd. As a block device it doesn't guarantee that e.g. the extent at offset 0 is written before the extent at offset 4096 unless it sees a flush between the writes. As suggested earlier in this thread, maintaining order during writeback would make not sending flushes (via mount -o nobarrier in the guest or cache=unsafe for qemu) safer from a crash-consistency point of view. An fs or database on top of rbd would still have to replay their internal journal, and could lose some writes, but should be able to end up in a consistent state that way. This would make larger caches more useful, and would be a simple way to use a large local cache devices as an rbd cache backend. Live migration should still work in such a system because qemu will still tell rbd to flush data at that point. A distributed local cache like [1] might be better long term, but much more complicated to implement. Josh [1] https://www.usenix.org/conference/fast15/technical-sessions/presentation/bhagwat ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)
I doubt it is anything to do with Ceph, hope you checked your switch is supporting Jumbo frames and you have set MTU 9000 to all the devices in between. It‘s better to ping your devices (all the devices participating in the cluster) like the way it mentioned in the following articles , just in case you are not sure. http://www.mylesgray.com/hardware/test-jumbo-frames-working/ http://serverfault.com/questions/234311/testing-whether-jumbo-frames-are-actually-working Hope this helps, Thanks Regards Somnath From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] Sent: Tuesday, June 02, 2015 4:32 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Setting the MTU to 1500 worked, monitors reach quorum right away. Unfortunately we really want Jumbo Frames to be on, any ideas on how to get ceph to work with them on? Thanks! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From:Somnath Roy somnath@sandisk.commailto:somnath@sandisk.com To:cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz Cc:ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com, ceph-users ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis j...@suse.demailto:j...@suse.de Date:03/06/2015 10:34 a.m. Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) We have seen some communication issue with that, try to make all the server MTU 1500 and try out… From: cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] Sent: Tuesday, June 02, 2015 3:31 PM To: Somnath Roy Cc: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) We are running with Jumbo Frames turned on. Is that likely to be the issue? Do I need to configure something in ceph? The mon maps are fine and after setting debug to 10 and debug ms to 1, I see probe timeouts in the logs: http://pastebin.com/44M1uJZc I just set probe timeout to 10 (up from 2) and it still times out. Thanks! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From:Somnath Roy somnath@sandisk.commailto:somnath@sandisk.com To:Joao Eduardo Luis j...@suse.demailto:j...@suse.de, ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Date:03/06/2015 03:49 a.m. Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Sent by:ceph-users ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com By any chance are you running with jumbo frame turned on ? Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joao Eduardo Luis Sent: Tuesday, June 02, 2015 12:52 AM To: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz wrote: I am trying to deploy a new ceph cluster and my monitors are not reaching quorum. SELinux is off, firewalls are off, I can see traffic between the nodes on port 6789 but when I use the admin socket to force a re-election only the monitor I send the request to shows the new election in its logs. My logs are filled entirely of the following two lines: 2015-06-02 11:31:56.447975 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch 2015-06-02 11:31:56.448272 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished You are running on default debug levels, so you'll hardly get anything more than that. I suggest setting 'debug mon = 10' and 'debug ms = 1' for added verbosity and come back to us with the logs. There are many reasons for this, but the more common are due
Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)
We have seen some communication issue with that, try to make all the server MTU 1500 and try out... From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] Sent: Tuesday, June 02, 2015 3:31 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) We are running with Jumbo Frames turned on. Is that likely to be the issue? Do I need to configure something in ceph? The mon maps are fine and after setting debug to 10 and debug ms to 1, I see probe timeouts in the logs: http://pastebin.com/44M1uJZc I just set probe timeout to 10 (up from 2) and it still times out. Thanks! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From:Somnath Roy somnath@sandisk.commailto:somnath@sandisk.com To:Joao Eduardo Luis j...@suse.demailto:j...@suse.de, ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Date:03/06/2015 03:49 a.m. Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Sent by:ceph-users ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com By any chance are you running with jumbo frame turned on ? Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joao Eduardo Luis Sent: Tuesday, June 02, 2015 12:52 AM To: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz wrote: I am trying to deploy a new ceph cluster and my monitors are not reaching quorum. SELinux is off, firewalls are off, I can see traffic between the nodes on port 6789 but when I use the admin socket to force a re-election only the monitor I send the request to shows the new election in its logs. My logs are filled entirely of the following two lines: 2015-06-02 11:31:56.447975 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch 2015-06-02 11:31:56.448272 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished You are running on default debug levels, so you'll hardly get anything more than that. I suggest setting 'debug mon = 10' and 'debug ms = 1' for added verbosity and come back to us with the logs. There are many reasons for this, but the more common are due to the monitors not being able to communicate with each other. Given you see traffic between the monitors, I'm inclined to assume that the other two monitors do not have each other on the monmap or, if they do know each other, either 1) the monitor's auth keys do not match, or 2) the probe timeout is being triggered before they successfully manage to find enough monitors to trigger an election -- which may be due to latency. Logs will tells us more. -Joao Querying the admin socket with mon_status (the other two are the similar but with their hostnames and rank): { name: wcm1, rank: 0, state: probing, election_epoch: 1, quorum: [], outside_quorum: [ wcm1 ], extra_probe_peers: [], sync_provider: [], monmap: { epoch: 0, fsid: adb8c500-122e-49fd-9c1e-a99af7832307, modified: 2015-06-02 10:43:41.467811, created: 2015-06-02 10:43:41.467811, mons: [ { rank: 0, name: wcm1, addr: 10.1.226.64:6789\/0 }, { rank: 1, name: wcm2, addr: 10.1.226.65:6789\/0 }, { rank: 2, name: wcm3, addr: 10.1.226.66:6789\/0 } ] } } ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this
Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)
On Wed, Jun 3, 2015 at 8:30 AM, cameron.scr...@solnet.co.nz wrote: We are running with Jumbo Frames turned on. Is that likely to be the issue? I got caught by this previously: http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-October/043955.html The problem is Ceph almost-but-not-quite works, leading you down lots of fruitless paths. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)
We are running with Jumbo Frames turned on. Is that likely to be the issue? Do I need to configure something in ceph? The mon maps are fine and after setting debug to 10 and debug ms to 1, I see probe timeouts in the logs: http://pastebin.com/44M1uJZc I just set probe timeout to 10 (up from 2) and it still times out. Thanks! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From: Somnath Roy somnath@sandisk.com To: Joao Eduardo Luis j...@suse.de, ceph-users@lists.ceph.com ceph-users@lists.ceph.com Date: 03/06/2015 03:49 a.m. Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Sent by:ceph-users ceph-users-boun...@lists.ceph.com By any chance are you running with jumbo frame turned on ? Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joao Eduardo Luis Sent: Tuesday, June 02, 2015 12:52 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote: I am trying to deploy a new ceph cluster and my monitors are not reaching quorum. SELinux is off, firewalls are off, I can see traffic between the nodes on port 6789 but when I use the admin socket to force a re-election only the monitor I send the request to shows the new election in its logs. My logs are filled entirely of the following two lines: 2015-06-02 11:31:56.447975 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch 2015-06-02 11:31:56.448272 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished You are running on default debug levels, so you'll hardly get anything more than that. I suggest setting 'debug mon = 10' and 'debug ms = 1' for added verbosity and come back to us with the logs. There are many reasons for this, but the more common are due to the monitors not being able to communicate with each other. Given you see traffic between the monitors, I'm inclined to assume that the other two monitors do not have each other on the monmap or, if they do know each other, either 1) the monitor's auth keys do not match, or 2) the probe timeout is being triggered before they successfully manage to find enough monitors to trigger an election -- which may be due to latency. Logs will tells us more. -Joao Querying the admin socket with mon_status (the other two are the similar but with their hostnames and rank): { name: wcm1, rank: 0, state: probing, election_epoch: 1, quorum: [], outside_quorum: [ wcm1 ], extra_probe_peers: [], sync_provider: [], monmap: { epoch: 0, fsid: adb8c500-122e-49fd-9c1e-a99af7832307, modified: 2015-06-02 10:43:41.467811, created: 2015-06-02 10:43:41.467811, mons: [ { rank: 0, name: wcm1, addr: 10.1.226.64:6789\/0 }, { rank: 1, name: wcm2, addr: 10.1.226.65:6789\/0 }, { rank: 2, name: wcm3, addr: 10.1.226.66:6789\/0 } ] } } ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Attention: This email may contain information intended for the sole use of the original recipient. Please respect this when sharing or disclosing this email's contents with any third party. If you believe you have received this email in error, please delete it and notify the sender or postmas...@solnetsolutions.co.nz as soon as possible. The
Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)
Seems to be something to do with our switch. If the interface MTU is too close to the switch MTU it stops working. Thanks for all your help :) Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From: Somnath Roy somnath@sandisk.com To: cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com, ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis j...@suse.de Date: 03/06/2015 11:49 a.m. Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) I doubt it is anything to do with Ceph, hope you checked your switch is supporting Jumbo frames and you have set MTU 9000 to all the devices in between. It‘s better to ping your devices (all the devices participating in the cluster) like the way it mentioned in the following articles , just in case you are not sure. http://www.mylesgray.com/hardware/test-jumbo-frames-working/ http://serverfault.com/questions/234311/testing-whether-jumbo-frames-are-actually-working Hope this helps, Thanks Regards Somnath From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] Sent: Tuesday, June 02, 2015 4:32 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Setting the MTU to 1500 worked, monitors reach quorum right away. Unfortunately we really want Jumbo Frames to be on, any ideas on how to get ceph to work with them on? Thanks! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From:Somnath Roy somnath@sandisk.com To:cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz Cc:ceph-users@lists.ceph.com ceph-users@lists.ceph.com, ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis j...@suse.de Date:03/06/2015 10:34 a.m. Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) We have seen some communication issue with that, try to make all the server MTU 1500 and try out… From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] Sent: Tuesday, June 02, 2015 3:31 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) We are running with Jumbo Frames turned on. Is that likely to be the issue? Do I need to configure something in ceph? The mon maps are fine and after setting debug to 10 and debug ms to 1, I see probe timeouts in the logs: http://pastebin.com/44M1uJZc I just set probe timeout to 10 (up from 2) and it still times out. Thanks! Cameron Scrace Infrastructure Engineer Mobile +64 22 610 4629 Phone +64 4 462 5085 Email cameron.scr...@solnet.co.nz Solnet Solutions Limited Level 12, Solnet House 70 The Terrace, Wellington 6011 PO Box 397, Wellington 6140 www.solnet.co.nz From:Somnath Roy somnath@sandisk.com To:Joao Eduardo Luis j...@suse.de, ceph-users@lists.ceph.com ceph-users@lists.ceph.com Date:03/06/2015 03:49 a.m. Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) Sent by:ceph-users ceph-users-boun...@lists.ceph.com By any chance are you running with jumbo frame turned on ? Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joao Eduardo Luis Sent: Tuesday, June 02, 2015 12:52 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote: I am trying to deploy a new ceph cluster and my monitors are not reaching quorum. SELinux is off, firewalls are off, I can see traffic between the nodes on port 6789 but when I use the admin socket to force a re-election only the monitor I send the request to shows the new election in its logs. My logs are filled entirely of the following two lines: 2015-06-02 11:31:56.447975 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch 2015-06-02 11:31:56.448272 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished You are running on default debug levels, so you'll hardly get anything more than
Re: [ceph-users] active+clean+scrubbing+deep
Thank Irek, it really worked 02.06.2015, 15:58, "Irek Fasikhov" malm...@gmail.com:Hi. Restart the OSD. :)2015-06-02 11:55 GMT+03:00 Никитенко Виталий v1...@yandex.ru:Hi! I have ceph version 0.94.1. root@ceph-node1:~# ceph -s cluster 3e0d58cd-d441-4d44-b49b-6cff08c20abf health HEALTH_OK monmap e2: 3 mons at {ceph-mon=10.10.100.3:6789/0,ceph-node1=10.10.100.1:6789/0,ceph-node2=10.10.100.2:6789/0} election epoch 428, quorum 0,1,2 ceph-node1,ceph-node2,ceph-mon osdmap e978: 16 osds: 16 up, 16 in pgmap v6735569: 2012 pgs, 8 pools, 2801 GB data, 703 kobjects 5617 GB used, 33399 GB / 39016 GB avail 2011 active+clean 1 active+clean+scrubbing+deep client io 174 kB/s rd, 30641 kB/s wr, 80 op/s root@ceph-node1:~# ceph pg dump | grep -i deep | cut -f 1 dumped all in format plain pg_stat 19.b3 In log file i see 2015-05-14 03:23:51.556876 7fc708a37700 0 log_channel(cluster) log [INF] : 19.b3 deep-scrub starts but no "19.b3 deep-scrub ok" then i do "ceph pg deep-scrub 19.b3", nothing happens and in logs file no any records about it. What can i do to pg return in "active + clean" station? is there any sense restart OSD or the entirely server where the OSD? Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек НургаязовичМоб.: +79229045757___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph RBD and Cephfuse
Hi Team, We are newly using ceph with two OSD and two clients, our requirement is when we write date through clients it should see in another client also, storage is mounted using rbd because we running git clone with large amount of small file and it is fast when use rbd mount, but data not sync in both the clients. Another option we used to mount Cephfuse , here i can see the data in both the clients but it is too slow with large amount of small file(202M) using git clone. Also tried NFS etc but it is slow. Kindly share the solution to achieve our requirements. Git clone using RBD mount but data not sync with two clents with same image partition. time git clone https://github.com/elastic/elasticsearch.git Initialized empty Git repository in /home/sas/cide/elasticsearch/.git/ remote: Counting objects: 359724, done. remote: Compressing objects: 100% (55/55), done. remote: Total 359724 (delta 59), reused 20 (delta 20), pack-reused 359649 Receiving objects: 100% (359724/359724), 129.04 MiB | 8.04 MiB/s, done. Resolving deltas: 100% (203986/203986), done. real0m49.255s user0m19.371s sys0m3.762s Git clone using cephfuse partition mount and i can see the data in both the clients but its take 11m time git clone https://github.com/elastic/elasticsearch.git Initialized empty Git repository in /home/sas/cide1/elasticsearch/.git/ remote: Counting objects: 359724, done. remote: Compressing objects: 100% (55/55), done. remote: Total 359724 (delta 59), reused 20 (delta 20), pack-reused 359649 Receiving objects: 100% (359724/359724), 129.04 MiB | 473 KiB/s, done. Resolving deltas: 100% (203986/203986), done. real11m16.371s user0m35.235s sys1m59.389s Regards Prabu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph RBD and Cephfuse
On 2015-06-02T15:40:54, gjprabu gjpr...@zohocorp.com wrote: Hi Team, We are newly using ceph with two OSD and two clients, our requirement is when we write date through clients it should see in another client also, storage is mounted using rbd because we running git clone with large amount of small file and it is fast when use rbd mount, but data not sync in both the clients. What file system are you using on top of RBD for this purpose? To achieve this goal, you'd need to use a cluster-aware file system (with all the complexity that entails) like OCFS2 or GFS2. You cannot mount something like XFS/btrfs/ext4 multiple times; that will, in fact, corrupt your data and likely crash the client's kernels. Regards, Lars -- Architect Storage/HA SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Recommendations for a driver situation
Hello, We have recently acquired new servers for a new ceph cluster and we want to run Debian on those servers. Unfortunately drivers needed for the raid controller are only available in newer kernels than what Debian Wheezy provides. We need to run the dumpling release of Ceph. Since the Ceph repo does not have packages for Debian Jessie I see 3 alternatives for us: 1. Wait for the Ceph repo to add packages for Debian Jessie. Number 1 is not really an option for us. But, is there an approximate ETA on this? 2. Run Debian Wheezy with backported drivers. 3. Build the Ceph dumpling packages for Debian Jessie. Number 3, is this possible? Cloning the master branch from git gives you the install_debs.sh script which can be used to build Ceph 9.0 packages(we need dumpling). And in the Dumpling branch there is no Debian package building script Which one of these would you recommend? Also, will Dumpling be released for Debian Jessie? Pontus Lindgren ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Installation Issues
Hi, I'm having some difficulty installing the Hammer release on CentOS 6.6 following the instructions here: http://docs.ceph.com/docs/master/start/quick-ceph-deploy/. The initial problem was with the install.py and uninstall.py scripts referencing radosgw instead of ceph-radosgw in the packages lists. Swapping these out enabled the installation of the radosgw packages on the cluster nodes. However, the execution of ceph-deploy rgw create [node] fails with a no such file or directory error. Any suggestions? I've copied the log file below. Thanks, [ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph_admin/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.23): /usr/bin/ceph-deploy rgw create ceph-node01 [ceph_deploy.rgw][DEBUG ] Deploying rgw, cluster ceph hosts ceph-node01:rgw.ceph-node01 [ceph-node01][DEBUG ] connection detected need for sudo [ceph-node01][DEBUG ] connected to host: ceph-node01 [ceph-node01][DEBUG ] detect platform information from remote host [ceph-node01][DEBUG ] detect machine type [ceph_deploy.rgw][INFO ] Distro info: CentOS 6.6 Final [ceph_deploy.rgw][DEBUG ] remote host will use sysvinit [ceph_deploy.rgw][DEBUG ] deploying rgw bootstrap to ceph-node01 [ceph-node01][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [ceph-node01][DEBUG ] create path if it doesn't exist [ceph_deploy.rgw][ERROR ] OSError: [Errno 2] No such file or directory: '/var/lib/ceph/radosgw/ceph-rgw.ceph-node01' [ceph_deploy][ERROR ] GenericError: Failed to create 1 RGWs Alex Dacre Systems Engineer ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph RBD and Cephfuse
Hi Lars, We installed centos in client machines with kernel version is 3.10 which is rbd supported modules. Now installed ocsfs2-tools and formated but mount through error. Please check below. mount -t ocfs2 /dev/rbd/rbd/newinteg /home/test/cide mount.ocfs2: Unable to access cluster service while trying initialize cluster mkfs.ocfs2 /dev/rbd/rbd/newinteg mkfs.ocfs2 1.6.4 Cluster stack: classic o2cb Label: Features: sparse backup-super unwritten inline-data strict-journal-super xattr Block size: 4096 (12 bits) Cluster size: 4096 (12 bits) Volume size: 7340032 (1792 clusters) (1792 blocks) Cluster groups: 556 (tail covers 17920 clusters, rest cover 32256 clusters) Extent allocator size: 12582912 (3 groups) Journal size: 268435456 Node slots: 8 Creating bitmaps: done Initializing superblock: done Writing system files: done Writing superblock: done Writing backup superblock: 4 block(s) Formatting Journals: done Growing extent allocator: done Formatting slot map: done Formatting quota files: done Writing lost+found: done mkfs.ocfs2 successful Regards Prabu On Tue, 02 Jun 2015 16:18:53 +0530 Lars Marowsky-Breelt;l...@suse.comgt; wrote On 2015-06-02T15:40:54, gjprabu lt;gjpr...@zohocorp.comgt; wrote: gt; Hi Team, gt; gt; We are newly using ceph with two OSD and two clients, our requirement is when we write date through clients it should see in another client also, storage is mounted using rbd because we running git clone with large amount of small file and it is fast when use rbd mount, but data not sync in both the clients. What file system are you using on top of RBD for this purpose? To achieve this goal, you'd need to use a cluster-aware file system (with all the complexity that entails) like OCFS2 or GFS2. You cannot mount something like XFS/btrfs/ext4 multiple times; that will, in fact, corrupt your data and likely crash the client's kernels. Regards, Lars -- Architect Storage/HA SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Best setup for SSD
Hi, On 02/06/15 16:18, Mark Nelson wrote: On 06/02/2015 09:02 AM, Phil Schwarz wrote: Le 02/06/2015 15:33, Eneko Lacunza a écrit : Hi, On 02/06/15 15:26, Phil Schwarz wrote: On 02/06/15 14:51, Phil Schwarz wrote: i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) cluster. -1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 3X 4TB SATA It'll be used as OSD+Mon server only. Are these SSDs Intel S3700 too? What amount of RAM? Yes, All DCS3700, for the four nodes. 16GB of RAM on this node. This should be enough for 3 OSDs I think, I used to have a Dell T20/Intel G3230 with 2x1TB OSDs with only 4 GB running OK. Cheers Eneko Yes, indeed. My main problem is doing something non adviced... Running VMs on Ceph nodes... No choice, but it seems that i'll have to do that. Hope i won't peg the CPU too quickly.. I'm doing it in 3 different Proxmox clusters. They're not very busy clusters, but works very well. You might want to consider using cgroups or some other mechanism to segment what runs on what cores. While not ideal, dedicating 2-3 of the cores to ceph and leaving the other(s) for VMs might be a reasonable way to go. I think this may be must if you setup a dedicated SSD pool. A single DC S3700 should suffice for journals for 4 OSDs. I wouldn't recommend using the other one for a cache tier unless you have a very highly skewed hot/cold workload. Perhaps instead make a dedicated SSD pool that could be used for high IOPS workloads. In fact you might consider skipping SSD journals and just making a dedicated SSD pool with all of the SSDs depending on how much write workload your main pool sees and if you could make good use of a dedicated SSD pool. Be warned that running SSD and HD based OSDs in the same server is not recommended. If you need the storage capacity, I'd stick to the journals on SSDs plan. Cheers Eneko -- Zuzendari Teknikoa / Director Técnico Binovo IT Human Project, S.L. Telf. 943575997 943493611 Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa) www.binovo.es ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Best setup for SSD
Le 02/06/2015 15:33, Eneko Lacunza a écrit : Hi, On 02/06/15 15:26, Phil Schwarz wrote: On 02/06/15 14:51, Phil Schwarz wrote: i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) cluster. -1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 3X 4TB SATA It'll be used as OSD+Mon server only. Are these SSDs Intel S3700 too? What amount of RAM? Yes, All DCS3700, for the four nodes. 16GB of RAM on this node. This should be enough for 3 OSDs I think, I used to have a Dell T20/Intel G3230 with 2x1TB OSDs with only 4 GB running OK. Cheers Eneko Yes, indeed. My main problem is doing something non adviced... Running VMs on Ceph nodes... No choice, but it seems that i'll have to do that. Hope i won't peg the CPU too quickly.. Best regards ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] active+clean+scrubbing+deep
that's a normal process running... for more information http://ceph.com/docs/master/rados/configuration/osd-config-ref/#scrubbing On Tue, Jun 2, 2015 at 9:55 AM, Никитенко Виталий v1...@yandex.ru wrote: Hi! I have ceph version 0.94.1. root@ceph-node1:~# ceph -s cluster 3e0d58cd-d441-4d44-b49b-6cff08c20abf health HEALTH_OK monmap e2: 3 mons at {ceph-mon= 10.10.100.3:6789/0,ceph-node1=10.10.100.1:6789/0,ceph-node2=10.10.100.2:6789/0 } election epoch 428, quorum 0,1,2 ceph-node1,ceph-node2,ceph-mon osdmap e978: 16 osds: 16 up, 16 in pgmap v6735569: 2012 pgs, 8 pools, 2801 GB data, 703 kobjects 5617 GB used, 33399 GB / 39016 GB avail 2011 active+clean 1 active+clean+scrubbing+deep client io 174 kB/s rd, 30641 kB/s wr, 80 op/s root@ceph-node1:~# ceph pg dump | grep -i deep | cut -f 1 dumped all in format plain pg_stat 19.b3 In log file i see 2015-05-14 03:23:51.556876 7fc708a37700 0 log_channel(cluster) log [INF] : 19.b3 deep-scrub starts but no 19.b3 deep-scrub ok then i do ceph pg deep-scrub 19.b3, nothing happens and in logs file no any records about it. What can i do to pg return in active + clean station? is there any sense restart OSD or the entirely server where the OSD? Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] What do internal_safe_to_start_threads and leveldb_compression do?
On Tue, Jun 2, 2015 at 6:47 AM, Erik Logtenberg e...@logtenberg.eu wrote: What does this do? - leveldb_compression: false (default: true) - leveldb_block/cache/write_buffer_size (all bigger than default) I take it you're running these commands on a monitor (from I think the Dumpling timeframe, or maybe even Firefly)? These are hitting specific settings in LevelDB which we tune differently for the monitor and OSD, but which were shared config options in older releases. They have their own settings in newer code. -Greg You are correct. I started out with Firefly and gradually upgraded the cluster as new releases came out. I am on Hammer (0.94.1) now. The current settings are different from the default. Does this mean that the settings are still Firefly-like and should be changed to the new default; or does this mean that the defaults are still Firefly-like but the settings are actually Hammer-style ;) and thus right. Hmm, I think you must be setting them in your config file for them to be different now, but I don't really remember...Joao? :) -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Best setup for SSD
On 06/02/2015 09:02 AM, Phil Schwarz wrote: Le 02/06/2015 15:33, Eneko Lacunza a écrit : Hi, On 02/06/15 15:26, Phil Schwarz wrote: On 02/06/15 14:51, Phil Schwarz wrote: i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) cluster. -1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 3X 4TB SATA It'll be used as OSD+Mon server only. Are these SSDs Intel S3700 too? What amount of RAM? Yes, All DCS3700, for the four nodes. 16GB of RAM on this node. This should be enough for 3 OSDs I think, I used to have a Dell T20/Intel G3230 with 2x1TB OSDs with only 4 GB running OK. Cheers Eneko Yes, indeed. My main problem is doing something non adviced... Running VMs on Ceph nodes... No choice, but it seems that i'll have to do that. Hope i won't peg the CPU too quickly.. You might want to consider using cgroups or some other mechanism to segment what runs on what cores. While not ideal, dedicating 2-3 of the cores to ceph and leaving the other(s) for VMs might be a reasonable way to go. A single DC S3700 should suffice for journals for 4 OSDs. I wouldn't recommend using the other one for a cache tier unless you have a very highly skewed hot/cold workload. Perhaps instead make a dedicated SSD pool that could be used for high IOPS workloads. In fact you might consider skipping SSD journals and just making a dedicated SSD pool with all of the SSDs depending on how much write workload your main pool sees and if you could make good use of a dedicated SSD pool. Things to think about! Best regards ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Best setup for SSD
Hi, i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) cluster. -1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 3X 4TB SATA It'll be used as OSD+Mon server only. - 3 nodes are setup upon Dell 730+ 1xXeon 2603, 48 GB RAM, 1x 1TB SAS for OS , 4x 4TB SATA for OSD and 2x DCS3700 200GB intel SSD I can't change the hardware, especially the poor cpu... Everything will be connected through Intel X520+Netgear XS708E, as 10GBE storage network. This cluster will support VM (mostly KVM) upon the 3 R730 nodes. I'm already aware of the CPU pegging all the time...But can't change it for the moment. The VM will be Filesharing servers, poor usage services (DNS,DHCP,AD or OpenLDAP). One Proxy cache (Squid) will be used upon a 100Mb Optical fiber with 500+ clients. My question is : Is it recommended to setup the 2 SSDS as : One SSD as journal for 2 (up to 3in the future) OSDs Or One SSD as journal for the 4 (up to 6 in the future) OSDs and the remaining SSD as cache tiering for the previous SSD+4 OSDs pool ? SSD should be rock solid enough to support both bandwidth and living time before being destroyed by the low amount of data that will be written on it (Few hundreds of GB per day as rule of thumb..) Thanks Best regards. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph RBD and Cephfuse
On 2015-06-02T17:23:58, gjprabu gjpr...@zohocorp.com wrote: Hi Lars, We installed centos in client machines with kernel version is 3.10 which is rbd supported modules. Now installed ocsfs2-tools and formated but mount through error. Please check below. You need to configure the ocfs2 cluster properly as well. You can use either o2cb (which I'm not familiar with anymore), or the pacemaker-integrated version: https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_ocfs2_create_service.html (should pretty much apply to CentOS as well). From this point on, rbd is really just a shared block device, and you may have better success if you use the us...@clusterlabs.org mailing list if you wish to pursue this route. Regards, Lars -- Architect Storage/HA SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Recommendations for a driver situation
Hi, On 02/06/15 14:18, Pontus Lindgren wrote: We have recently acquired new servers for a new ceph cluster and we want to run Debian on those servers. Unfortunately drivers needed for the raid controller are only available in newer kernels than what Debian Wheezy provides. We need to run the dumpling release of Ceph. Since the Ceph repo does not have packages for Debian Jessie I see 3 alternatives for us: 1. Wait for the Ceph repo to add packages for Debian Jessie. Number 1 is not really an option for us. But, is there an approximate ETA on this? Why is this the case? At least Alexandre Derumier is working on this: (check an email from him in this list on 12th May) http://odisoweb1.odiso.net/ceph-jessie/ 2. Run Debian Wheezy with backported drivers. I haven't used them lately, but linux kernel in wheezy-backport is 3.16, is this enough? What kernel version do you require for the drivers? Cheers Eneko -- Zuzendari Teknikoa / Director Técnico Binovo IT Human Project, S.L. Telf. 943575997 943493611 Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa) www.binovo.es ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Best setup for SSD
Hi, On 02/06/15 14:51, Phil Schwarz wrote: i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) cluster. -1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 3X 4TB SATA It'll be used as OSD+Mon server only. Are these SSDs Intel S3700 too? What amount of RAM? - 3 nodes are setup upon Dell 730+ 1xXeon 2603, 48 GB RAM, 1x 1TB SAS for OS , 4x 4TB SATA for OSD and 2x DCS3700 200GB intel SSD I can't change the hardware, especially the poor cpu... Everything will be connected through Intel X520+Netgear XS708E, as 10GBE storage network. This cluster will support VM (mostly KVM) upon the 3 R730 nodes. I'm already aware of the CPU pegging all the time...But can't change it for the moment. The VM will be Filesharing servers, poor usage services (DNS,DHCP,AD or OpenLDAP). One Proxy cache (Squid) will be used upon a 100Mb Optical fiber with 500+ clients. My question is : Is it recommended to setup the 2 SSDS as : One SSD as journal for 2 (up to 3in the future) OSDs Or One SSD as journal for the 4 (up to 6 in the future) OSDs and the remaining SSD as cache tiering for the previous SSD+4 OSDs pool ? I haven't used cache tiering myself, but others have not reported much benefit from it (if any) at all, at least this is my understanding. So I think it would be better to use both SSDs for journals. It probably won't help performance using 2 instead of only 1, but it will lessen the impact from a SSD failure. Also it seems that the consensus is 3-4 OSD for each SSD, so it will help when you expand to 6 OSD. SSD should be rock solid enough to support both bandwidth and living time before being destroyed by the low amount of data that will be written on it (Few hundreds of GB per day as rule of thumb..) If all are Intel S3700 you're on the safe side unless you have lots on writes. Anyway I suggest you monitor the SMART values. Cheers Eneko -- Zuzendari Teknikoa / Director Técnico Binovo IT Human Project, S.L. Telf. 943575997 943493611 Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa) www.binovo.es ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Best setup for SSD
Thanks for your answers; mine are inline, too. Le 02/06/2015 15:17, Eneko Lacunza a écrit : Hi, On 02/06/15 14:51, Phil Schwarz wrote: i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) cluster. -1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 3X 4TB SATA It'll be used as OSD+Mon server only. Are these SSDs Intel S3700 too? What amount of RAM? Yes, All DCS3700, for the four nodes. 16GB of RAM on this node. - 3 nodes are setup upon Dell 730+ 1xXeon 2603, 48 GB RAM, 1x 1TB SAS for OS , 4x 4TB SATA for OSD and 2x DCS3700 200GB intel SSD I can't change the hardware, especially the poor cpu... Everything will be connected through Intel X520+Netgear XS708E, as 10GBE storage network. This cluster will support VM (mostly KVM) upon the 3 R730 nodes. I'm already aware of the CPU pegging all the time...But can't change it for the moment. The VM will be Filesharing servers, poor usage services (DNS,DHCP,AD or OpenLDAP). One Proxy cache (Squid) will be used upon a 100Mb Optical fiber with 500+ clients. My question is : Is it recommended to setup the 2 SSDS as : One SSD as journal for 2 (up to 3in the future) OSDs Or One SSD as journal for the 4 (up to 6 in the future) OSDs and the remaining SSD as cache tiering for the previous SSD+4 OSDs pool ? I haven't used cache tiering myself, but others have not reported much benefit from it (if any) at all, at least this is my understanding. Yes, confirmed by the thread SSD DIsk Distribution. So I think it would be better to use both SSDs for journals. It probably won't help performance using 2 instead of only 1, but it will lessen the impact from a SSD failure. Also it seems that the consensus is 3-4 OSD for each SSD, so it will help when you expand to 6 OSD. Agree; let's go apart from tiering and use journals only. SSD should be rock solid enough to support both bandwidth and living time before being destroyed by the low amount of data that will be written on it (Few hundreds of GB per day as rule of thumb..) If all are Intel S3700 you're on the safe side unless you have lots on writes. Anyway I suggest you monitor the SMART values. Ok, i'll keep that in mind too. Thanks Cheers Eneko ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Best setup for SSD
Hi, On 02/06/15 15:26, Phil Schwarz wrote: On 02/06/15 14:51, Phil Schwarz wrote: i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) cluster. -1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 3X 4TB SATA It'll be used as OSD+Mon server only. Are these SSDs Intel S3700 too? What amount of RAM? Yes, All DCS3700, for the four nodes. 16GB of RAM on this node. This should be enough for 3 OSDs I think, I used to have a Dell T20/Intel G3230 with 2x1TB OSDs with only 4 GB running OK. Cheers Eneko -- Zuzendari Teknikoa / Director Técnico Binovo IT Human Project, S.L. Telf. 943575997 943493611 Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa) www.binovo.es ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Recommendations for a driver situation
Why is this the case? At least Alexandre Derumier is working on this: (check an email from him in this list on 12th May) http://odisoweb1.odiso.net/ceph-jessie/ We are in a hurry. I haven't used them lately, but linux kernel in wheezy-backport is 3.16, is this enough? What kernel version do you require for the drivers? Yes 3.16 is enough. So this is looking like the best option right now. Pontus Lindgren On 02 Jun 2015, at 15:08, Eneko Lacunza elacu...@binovo.es wrote: Hi, On 02/06/15 14:18, Pontus Lindgren wrote: We have recently acquired new servers for a new ceph cluster and we want to run Debian on those servers. Unfortunately drivers needed for the raid controller are only available in newer kernels than what Debian Wheezy provides. We need to run the dumpling release of Ceph. Since the Ceph repo does not have packages for Debian Jessie I see 3 alternatives for us: 1. Wait for the Ceph repo to add packages for Debian Jessie. Number 1 is not really an option for us. But, is there an approximate ETA on this? Why is this the case? At least Alexandre Derumier is working on this: (check an email from him in this list on 12th May) http://odisoweb1.odiso.net/ceph-jessie/ 2. Run Debian Wheezy with backported drivers. I haven't used them lately, but linux kernel in wheezy-backport is 3.16, is this enough? What kernel version do you require for the drivers? Cheers Eneko -- Zuzendari Teknikoa / Director Técnico Binovo IT Human Project, S.L. Telf. 943575997 943493611 Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa) www.binovo.es ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] What do internal_safe_to_start_threads and leveldb_compression do?
What does this do? - leveldb_compression: false (default: true) - leveldb_block/cache/write_buffer_size (all bigger than default) I take it you're running these commands on a monitor (from I think the Dumpling timeframe, or maybe even Firefly)? These are hitting specific settings in LevelDB which we tune differently for the monitor and OSD, but which were shared config options in older releases. They have their own settings in newer code. -Greg You are correct. I started out with Firefly and gradually upgraded the cluster as new releases came out. I am on Hammer (0.94.1) now. The current settings are different from the default. Does this mean that the settings are still Firefly-like and should be changed to the new default; or does this mean that the defaults are still Firefly-like but the settings are actually Hammer-style ;) and thus right. Thanks, Erik. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)
By any chance are you running with jumbo frame turned on ? Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joao Eduardo Luis Sent: Tuesday, June 02, 2015 12:52 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic) On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote: I am trying to deploy a new ceph cluster and my monitors are not reaching quorum. SELinux is off, firewalls are off, I can see traffic between the nodes on port 6789 but when I use the admin socket to force a re-election only the monitor I send the request to shows the new election in its logs. My logs are filled entirely of the following two lines: 2015-06-02 11:31:56.447975 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch 2015-06-02 11:31:56.448272 7f795b17a700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished You are running on default debug levels, so you'll hardly get anything more than that. I suggest setting 'debug mon = 10' and 'debug ms = 1' for added verbosity and come back to us with the logs. There are many reasons for this, but the more common are due to the monitors not being able to communicate with each other. Given you see traffic between the monitors, I'm inclined to assume that the other two monitors do not have each other on the monmap or, if they do know each other, either 1) the monitor's auth keys do not match, or 2) the probe timeout is being triggered before they successfully manage to find enough monitors to trigger an election -- which may be due to latency. Logs will tells us more. -Joao Querying the admin socket with mon_status (the other two are the similar but with their hostnames and rank): { name: wcm1, rank: 0, state: probing, election_epoch: 1, quorum: [], outside_quorum: [ wcm1 ], extra_probe_peers: [], sync_provider: [], monmap: { epoch: 0, fsid: adb8c500-122e-49fd-9c1e-a99af7832307, modified: 2015-06-02 10:43:41.467811, created: 2015-06-02 10:43:41.467811, mons: [ { rank: 0, name: wcm1, addr: 10.1.226.64:6789\/0 }, { rank: 1, name: wcm2, addr: 10.1.226.65:6789\/0 }, { rank: 2, name: wcm3, addr: 10.1.226.66:6789\/0 } ] } } ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] bursty IO, ceph cache pool can not follow evictions
Hi, we were rsync-streaming with 4 cephfs client to a ceph cluster with a cache layer upon an erasure coded pool. This was going on for some time, and didn't have real problems. Today we added 2 more streams, and very soon we saw some strange behaviour: - We are getting blocked requests on our cache pool osds - our cache pool is often near/ at max ratio - Our data streams have very bursty IO, (streaming a minute a few hunderds MB and then nothing) Our OSDs are not overloaded (nor the ECs nor cache, checked with iostat), though it seems like the cache pool can not evict objects in time, and get blocked until that is ok, each time again. If I rise the target_max_bytes limit, it starts streaming again until it is full again. cache parameters we have are these: ceph osd pool set cache hit_set_type bloom ceph osd pool set cache hit_set_count 1 ceph osd pool set cache hit_set_period 3600 ceph osd pool set cache target_max_bytes $((14*75*1024*1024*1024)) ceph osd pool set cache cache_target_dirty_ratio 0.4 ceph osd pool set cache cache_target_full_ratio 0.8 What can be the issue here ? I tried to find some information about the 'cache agent' , but can only find some old references.. Thank you! Kenneth ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] bursty IO, ceph cache pool can not follow evictions
Hi Kenneth, I suggested an idea which may help with this, it is being currently being developed . https://github.com/ceph/ceph/pull/4792 In short there is a high and low threshold with different flushing priorities. Hopefully this will help with bursty workloads. Nick -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Kenneth Waegeman Sent: 02 June 2015 17:54 To: ceph-users@lists.ceph.com Subject: [ceph-users] bursty IO, ceph cache pool can not follow evictions Hi, we were rsync-streaming with 4 cephfs client to a ceph cluster with a cache layer upon an erasure coded pool. This was going on for some time, and didn't have real problems. Today we added 2 more streams, and very soon we saw some strange behaviour: - We are getting blocked requests on our cache pool osds - our cache pool is often near/ at max ratio - Our data streams have very bursty IO, (streaming a minute a few hunderds MB and then nothing) Our OSDs are not overloaded (nor the ECs nor cache, checked with iostat), though it seems like the cache pool can not evict objects in time, and get blocked until that is ok, each time again. If I rise the target_max_bytes limit, it starts streaming again until it is full again. cache parameters we have are these: ceph osd pool set cache hit_set_type bloom ceph osd pool set cache hit_set_count 1 ceph osd pool set cache hit_set_period 3600 ceph osd pool set cache target_max_bytes $((14*75*1024*1024*1024)) ceph osd pool set cache cache_target_dirty_ratio 0.4 ceph osd pool set cache cache_target_full_ratio 0.8 What can be the issue here ? I tried to find some information about the 'cache agent' , but can only find some old references.. Thank you! Kenneth ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] PG size distribution
Hello, I have some questions about the size of my placement groups and how I can get a more even distribution. We currently have 160 2TB OSDs across 20 chassis. We have 133TB used in our radosgw pool with a replica size of 2. We want to move to 3 replicas but are concerned we may fill up some of our OSDs. Some OSDs have ~1.1TB free while others only have ~600GB free. The radosgw pool has 4096 pgs, looking at the documentation I probably want to increase this up to 8192, but we have decided to hold off on that for now. So, now for the pg usage. I dumped out the PG stats and noticed that there are two groups of PG sizes in my cluster. There are about 1024 PGs that are each around 17-18GB in size. The rest of the PGs are all around 34-36GB in size. Any idea why there are two distinct groups? We only have the one pool with data in it, though there are several different buckets in the radosgw pool. The data in the pool ranges from small images to 4-6mb audio files. Will increasing the number of PGs on this pool provide a more even distribution? Another thing to note is that the initial cluster was built lopsided, with some 4TB OSDs and some 2TB, we have removed all the 4TB disks and are only using 2TBs across the entire cluster. Not sure if this would have had any impact. Thank you for your time and I would appreciate any insight the community can offer. - Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] bursty IO, ceph cache pool can not follow evictions
Kenneth, My guess is that you’re hitting the cache_target_full_ratio on an individual OSD, which is easy to do since most of us tend to think of the cache_target_full_ratio as an aggregate of the OSDs (which it is not according to Greg Farnum). This posting may shed more light on the issue, if it is indeed what you are bumping up against. https://www.mail-archive.com/ceph-users%40lists.ceph.com/msg20207.html BTW: how are you determining that your OSDs are ‘not overloaded?’ Are you judging that by iostat utilization, or by capacity consumed? -- Paul On Jun 2, 2015, at 9:53 AM, Kenneth Waegeman kenneth.waege...@ugent.bemailto:kenneth.waege...@ugent.be wrote: Hi, we were rsync-streaming with 4 cephfs client to a ceph cluster with a cache layer upon an erasure coded pool. This was going on for some time, and didn't have real problems. Today we added 2 more streams, and very soon we saw some strange behaviour: - We are getting blocked requests on our cache pool osds - our cache pool is often near/ at max ratio - Our data streams have very bursty IO, (streaming a minute a few hunderds MB and then nothing) Our OSDs are not overloaded (nor the ECs nor cache, checked with iostat), though it seems like the cache pool can not evict objects in time, and get blocked until that is ok, each time again. If I rise the target_max_bytes limit, it starts streaming again until it is full again. cache parameters we have are these: ceph osd pool set cache hit_set_type bloom ceph osd pool set cache hit_set_count 1 ceph osd pool set cache hit_set_period 3600 ceph osd pool set cache target_max_bytes $((14*75*1024*1024*1024)) ceph osd pool set cache cache_target_dirty_ratio 0.4 ceph osd pool set cache cache_target_full_ratio 0.8 What can be the issue here ? I tried to find some information about the 'cache agent' , but can only find some old references.. Thank you! Kenneth ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Read Errors and OSD Flapping
On Sun, May 31, 2015 at 2:09 AM, Nick Fisk wrote: Thanks for the suggestions. I will introduce the disk 1st and see if the smart stats change from pending sectors to reallocated, if they don't then I will do the DD and smart test. It will be a good test as to what to do in this situation as I have a feeling this will most likely happen again. Please post back when you have a result, I'd like to know the outcome. Well the disk has finished rebalancing back into the cluster. The smart stats are not showing any pending sectors anymore, but strangely no reallocated ones either. I can only guess that when the drive tried to write to them again it succeeded without needing a remap??? I will continue to monitor the disk smart stats and see if I hit the same problem again. - Robert LeBlanc GPG Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 - BEGIN PGP SIGNATURE- Version: Mailvelope v0.13.1 Comment: https://www.mailvelope.com wsFcBAEBCAAQBQJVbLDTCRDmVDuy+mK58QAAFiwP/2EubdyL06YSNgSGyOr 4 +hWPTq530xvD/M6HNHb9xajQv8UGRF0uOM/FI/n1ln7ajDRbDGn/WazMgZD N uvCRpEtkw/OSRXiabBmPmKcACtMQbFADPMyDVR2130pmedN/pFHZFASy8X Cg IpnE5+Oj2+Fe8z1fXnwpHdutVE0I/BK+4vQAMuypVUwpv5jZ+Nd1NSOUbe7T q/x3vUQNEVpqSP5YCYYJJZOluAdmuvyAzsP1pMP42G920/F1KVVyyFG/ONnv 0EtPNG7FrpMauT0OM9zhSkTkfF4rYdK1L9MqzsI0hDqYMijPXe+tcHrndM3s l+wU5ZsKpQ+6xy6Rgv6LJdvVrXME5twAgy6y8dBtOSwyJztc/77w+FT4xbDS wg2k9AH09uG3CehvTvkuPQQkyXtCT+4LYpeU5l9aMn1hPFh0iOJdBi7rPbOf 17ERT+c0EPReZ+lSCwYEeVnd9iL8quE9AFEKYzDJnZCL2jDQY4Fr7JC2dyw/ LF1CKk5WU78eQT4aS3AaV0wYG+UzPFeTj8cPeWtqBrQtgzkPjPzeG/7Kpsf3 npWc/HQg7LB8rZAZ3ADRVE+KaJhuUsl1gRfk78bdGbTBDTpyeki7kywY6ODi +OUpUEPhyxkNr0OeD8eAQz2k+6/RJQfBFTeevuLRbMTlESGQnUpNVMk/1A7 7 yCPF =c0Vh -END PGP SIGNATURE- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Read Errors and OSD Flapping
On Sat, May 30, 2015 at 2:23 PM, Nick Fisk n...@fisk.me.uk wrote: Hi All, I was noticing poor performance on my cluster and when I went to investigate I noticed OSD 29 was flapping up and down. On investigation it looks like it has 2 pending sectors, kernel log is filled with the following end_request: critical medium error, dev sdk, sector 4483365656 end_request: critical medium error, dev sdk, sector 4483365872 I can see in the OSD logs that it looked like when the OSD was crashing it was trying to scrub the PG, probably failing when the kernel passes up the read error. ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) 1: /usr/bin/ceph-osd() [0xacaf4a] 2: (()+0x10340) [0x7fdc43032340] 3: (gsignal()+0x39) [0x7fdc414d1cc9] 4: (abort()+0x148) [0x7fdc414d50d8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fdc41ddc6b5] 6: (()+0x5e836) [0x7fdc41dda836] 7: (()+0x5e863) [0x7fdc41dda863] 8: (()+0x5eaa2) [0x7fdc41ddaaa2] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc2908] 10: (FileStore::read(coll_t, ghobject_t const, unsigned long, unsigned long, ceph::buffer::list, unsigned int, bool)+0xc98) [0x9168e 8] 11: (ReplicatedBackend::be_deep_scrub(hobject_t const, unsigned int, ScrubMap::object, ThreadPool::TPHandle)+0x2f9) [0xa05bf9] 12: (PGBackend::be_scan_list(ScrubMap, std::vectorhobject_t, std::allocatorhobject_t const, bool, unsigned int, ThreadPool::TPH andle)+0x2c8) [0x8dab98] 13: (PG::build_scrub_map_chunk(ScrubMap, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle)+0x1fa) [0x7f099a] 14: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle)+0x4a2) [0x7f1132] 15: (OSD::RepScrubWQ::_process(MOSDRepScrub*, ThreadPool::TPHandle)+0xbe) [0x6e583e] 16: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb38ae] 17: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950] 18: (()+0x8182) [0x7fdc4302a182] 19: (clone()+0x6d) [0x7fdc4159547d] Few questions: 1. Is this the expected behaviour, or should Ceph try and do something to either keep the OSD down or rewrite the sector to cause a sector remap? So the OSD is committing suicide and we want it to stay dead. But the init system is restarting it. We are actually discussing how that should change right now, but aren't quite sure what the right settings are: http://tracker.ceph.com/issues/11798 Presuming you still have the logs, how long was the cycle time for it to suicide, restart, and suicide again? 2. I am monitoring smart stats, but is there any other way of picking this up or getting Ceph to highlight it? Something like a flapping OSD notification would be nice. 3. I’m assuming at this stage this disk will not be replaceable under warranty, am I best to mark it as out, let it drain and then re-introduce it again, which should overwrite the sector and cause a remap? Or is there a better way? I'm not really sure about these ones. I imagine most users are covering it via nagios monitoring of the processes themselves? -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG size distribution
Post the output from your “ceph osd tree”. We were in a similiar situation, some of the OSDs were quite full while other had 50% free. This is exactly why we increased the number of PGs, and it helped to some degree. Are all your hosts the same size? Does your CRUSH map select a host in the end? That way if you have few hosts with differing number of OSDs the distribution will be poor (IMHO). Anyway, when we started increasing the PG numbers we first generated the PGs themselves (pg_num) in small increments since that put a lot of load on the OSDs and we were seeing slow requests with large increases. So something like this: for i in `seq 4096 64 8192` ; do ceph osd pool set poolname pg_num $i ; done This ate a few gigs from the drives (1-2GB if I remember correctly). Once that was finished we increased the pgp_num in larger and larger increments - at first 64 at a time and then 512 at a time when we were reaching the target (16384 in our case). This does allocate more space temporarily, and it seems to just randomly move data around - one minute an OSD is fine, another and the OSD is nearing full. One of us basically had to watch the process all the time, reweighting the devices that were almost full. With increasing number of PGs it became much simpler, as the overhead was smaller, every bit of work was smaller and all the management operations a lot smoother. YMMV - our data distribution was poor from the start, hosts had differing weights due to differing number of OSDs, there were some historical remnants when we tried to load-balance the data by hand, and we ended in a much better state but not perfect - some OSDs still have much more free space than other. We haven’t touched the CRUSH map at all during this process, once we do and set newer tunables then the data distribution should be much more even. I’d love to hear the others’ input since we are not sure why exactly this problem is present at all - I’d expect it to fill all the OSDs to the same or close-enough level, but in reality we have OSDs with weight 1.0 which are almost empty and others with weight 0.5 which are nearly full… When adding data it seems to (subjectively) distribute them evenly... Jan On 02 Jun 2015, at 18:52, Daniel Maraio dmar...@choopa.com wrote: Hello, I have some questions about the size of my placement groups and how I can get a more even distribution. We currently have 160 2TB OSDs across 20 chassis. We have 133TB used in our radosgw pool with a replica size of 2. We want to move to 3 replicas but are concerned we may fill up some of our OSDs. Some OSDs have ~1.1TB free while others only have ~600GB free. The radosgw pool has 4096 pgs, looking at the documentation I probably want to increase this up to 8192, but we have decided to hold off on that for now. So, now for the pg usage. I dumped out the PG stats and noticed that there are two groups of PG sizes in my cluster. There are about 1024 PGs that are each around 17-18GB in size. The rest of the PGs are all around 34-36GB in size. Any idea why there are two distinct groups? We only have the one pool with data in it, though there are several different buckets in the radosgw pool. The data in the pool ranges from small images to 4-6mb audio files. Will increasing the number of PGs on this pool provide a more even distribution? Another thing to note is that the initial cluster was built lopsided, with some 4TB OSDs and some 2TB, we have removed all the 4TB disks and are only using 2TBs across the entire cluster. Not sure if this would have had any impact. Thank you for your time and I would appreciate any insight the community can offer. - Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Read Errors and OSD Flapping
-Original Message- From: Gregory Farnum [mailto:g...@gregs42.com] Sent: 02 June 2015 18:34 To: Nick Fisk Cc: ceph-users Subject: Re: [ceph-users] Read Errors and OSD Flapping On Sat, May 30, 2015 at 2:23 PM, Nick Fisk n...@fisk.me.uk wrote: Hi All, I was noticing poor performance on my cluster and when I went to investigate I noticed OSD 29 was flapping up and down. On investigation it looks like it has 2 pending sectors, kernel log is filled with the following end_request: critical medium error, dev sdk, sector 4483365656 end_request: critical medium error, dev sdk, sector 4483365872 I can see in the OSD logs that it looked like when the OSD was crashing it was trying to scrub the PG, probably failing when the kernel passes up the read error. ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) 1: /usr/bin/ceph-osd() [0xacaf4a] 2: (()+0x10340) [0x7fdc43032340] 3: (gsignal()+0x39) [0x7fdc414d1cc9] 4: (abort()+0x148) [0x7fdc414d50d8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fdc41ddc6b5] 6: (()+0x5e836) [0x7fdc41dda836] 7: (()+0x5e863) [0x7fdc41dda863] 8: (()+0x5eaa2) [0x7fdc41ddaaa2] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbc2908] 10: (FileStore::read(coll_t, ghobject_t const, unsigned long, unsigned long, ceph::buffer::list, unsigned int, bool)+0xc98) [0x9168e 8] 11: (ReplicatedBackend::be_deep_scrub(hobject_t const, unsigned int, ScrubMap::object, ThreadPool::TPHandle)+0x2f9) [0xa05bf9] 12: (PGBackend::be_scan_list(ScrubMap, std::vectorhobject_t, std::allocatorhobject_t const, bool, unsigned int, ThreadPool::TPH andle)+0x2c8) [0x8dab98] 13: (PG::build_scrub_map_chunk(ScrubMap, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle)+0x1fa) [0x7f099a] 14: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle)+0x4a2) [0x7f1132] 15: (OSD::RepScrubWQ::_process(MOSDRepScrub*, ThreadPool::TPHandle)+0xbe) [0x6e583e] 16: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb38ae] 17: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950] 18: (()+0x8182) [0x7fdc4302a182] 19: (clone()+0x6d) [0x7fdc4159547d] Few questions: 1. Is this the expected behaviour, or should Ceph try and do something to either keep the OSD down or rewrite the sector to cause a sector remap? So the OSD is committing suicide and we want it to stay dead. But the init system is restarting it. We are actually discussing how that should change right now, but aren't quite sure what the right settings are: http://tracker.ceph.com/issues/11798 Presuming you still have the logs, how long was the cycle time for it to suicide, restart, and suicide again? Just looking through a few examples of it. It looks like it took about 2 seconds from suicide to restart and then about 5 minutes till it died again. I have taken a copy of the log, let me know if it's of any use to you. 2. I am monitoring smart stats, but is there any other way of picking this up or getting Ceph to highlight it? Something like a flapping OSD notification would be nice. 3. I’m assuming at this stage this disk will not be replaceable under warranty, am I best to mark it as out, let it drain and then re-introduce it again, which should overwrite the sector and cause a remap? Or is there a better way? I'm not really sure about these ones. I imagine most users are covering it via nagios monitoring of the processes themselves? -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com