Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Joao Eduardo Luis
On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote:
 I am trying to deploy a new ceph cluster and my monitors are not
 reaching quorum. SELinux is off, firewalls are off, I can see traffic
 between the nodes on port 6789 but when I use the admin socket to force
 a re-election only the monitor I send the request to shows the new
 election in its logs. My logs are filled entirely of the following two
 lines:
 
 2015-06-02 11:31:56.447975 7f795b17a700  0 log_channel(audit) log [DBG]
 : from='admin socket' entity='admin socket' cmd='mon_status' args=[]:
 dispatch
 2015-06-02 11:31:56.448272 7f795b17a700  0 log_channel(audit) log [DBG]
 : from='admin socket' entity='admin socket' cmd=mon_status args=[]:
 finished

You are running on default debug levels, so you'll hardly get anything
more than that.  I suggest setting 'debug mon = 10' and 'debug ms = 1'
for added verbosity and come back to us with the logs.

There are many reasons for this, but the more common are due to the
monitors not being able to communicate with each other.  Given you see
traffic between the monitors, I'm inclined to assume that the other two
monitors do not have each other on the monmap or, if they do know each
other, either 1) the monitor's auth keys do not match, or 2) the probe
timeout is being triggered before they successfully manage to find
enough monitors to trigger an election -- which may be due to latency.

Logs will tells us more.

  -Joao

 Querying the admin socket with mon_status (the other two are the similar
 but with their hostnames and rank):
 
 {
 name: wcm1,
 rank: 0,
 state: probing,
 election_epoch: 1,
 quorum: [],
 outside_quorum: [
 wcm1
 ],
 extra_probe_peers: [],
 sync_provider: [],
 monmap: {
 epoch: 0,
 fsid: adb8c500-122e-49fd-9c1e-a99af7832307,
 modified: 2015-06-02 10:43:41.467811,
 created: 2015-06-02 10:43:41.467811,
 mons: [
 {
 rank: 0,
 name: wcm1,
 addr: 10.1.226.64:6789\/0
 },
 {
 rank: 1,
 name: wcm2,
 addr: 10.1.226.65:6789\/0
 },
 {
 rank: 2,
 name: wcm3,
 addr: 10.1.226.66:6789\/0
 }
 ]
 }
 }

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] active+clean+scrubbing+deep

2015-06-02 Thread Никитенко Виталий
Hi!

I have ceph version 0.94.1.

root@ceph-node1:~# ceph -s
cluster 3e0d58cd-d441-4d44-b49b-6cff08c20abf
 health HEALTH_OK
 monmap e2: 3 mons at 
{ceph-mon=10.10.100.3:6789/0,ceph-node1=10.10.100.1:6789/0,ceph-node2=10.10.100.2:6789/0}
election epoch 428, quorum 0,1,2 ceph-node1,ceph-node2,ceph-mon
 osdmap e978: 16 osds: 16 up, 16 in
  pgmap v6735569: 2012 pgs, 8 pools, 2801 GB data, 703 kobjects
5617 GB used, 33399 GB / 39016 GB avail
2011 active+clean
   1 active+clean+scrubbing+deep
  client io 174 kB/s rd, 30641 kB/s wr, 80 op/s
  
root@ceph-node1:~# ceph pg dump  | grep -i deep | cut -f 1
  dumped all in format plain
  pg_stat
  19.b3  
  
In log file i see 
2015-05-14 03:23:51.556876 7fc708a37700  0 log_channel(cluster) log [INF] : 
19.b3 deep-scrub starts
but no 19.b3 deep-scrub ok

then i do ceph pg deep-scrub 19.b3, nothing happens and in logs file no any 
records about it.

What can i do to pg return in active + clean station?
is there any sense restart OSD or the entirely server where the OSD?

Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph on RHEL7.0

2015-06-02 Thread HEWLETT, Paul (Paul)
Hi Ken

Are these packages compatible with Giant or Hammer?

We are currently running Hammer - can we use the RBD kernel module from
RH7.1 and is the elrepo version of cephFS compatible with Hammer?

Regards
Paul

On 01/06/2015 17:57, Ken Dreyer kdre...@redhat.com wrote:

For the sake of providing more clarity regarding the Ceph kernel module
situation on RHEL 7.0, I've removed all the files at
https://github.com/ceph/ceph-kmod-rpm and updated the README there.

The summary is that if you want to use Ceph's RBD kernel module on RHEL
7, you should use RHEL 7.1 or later. And if you want to use the kernel
CephFS client on RHEL 7, you should use the latest upstream kernel
packages from ELRepo.

Hope that clarifies things from a RHEL 7 kernel perspective.

- Ken


On 05/28/2015 09:16 PM, Luke Kao wrote:
 Hi Bruce,
 RHEL7.0 kernel has many issues on filesystem sub modules and most of
 them fixed only in RHEL7.1.
 So you should consider to go to RHEL7.1 directly and upgrade to at least
 kernel 3.10.0-229.1.2
 
 
 BR,
 Luke
 
 
 *From:* ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of
 Bruce McFarland [bruce.mcfarl...@taec.toshiba.com]
 *Sent:* Friday, May 29, 2015 5:13 AM
 *To:* ceph-users@lists.ceph.com
 *Subject:* [ceph-users] Ceph on RHEL7.0
 
 We¹re planning on moving from Centos6.5 to RHEL7.0 for Ceph storage and
 monitor nodes. Are there any known issues using RHEL7.0?
 
 Thanks
 
 
 
 
 This electronic message contains information from Mycom which may be
 privileged or confidential. The information is intended to be for the
 use of the individual(s) or entity named above. If you are not the
 intended recipient, be aware that any disclosure, copying, distribution
 or any other use of the contents of this information is prohibited. If
 you have received this electronic message in error, please notify us by
 post or telephone (to the numbers or correspondence address above) or by
 email (at the email address above) immediately.
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SLES Packages

2015-06-02 Thread Lars Marowsky-Bree
On 2015-06-01T15:41:05, Steffen Weißgerber weissgerb...@ksnb.de wrote:

 Hi,
 
 I'm searching for actual packages for SLES11 SP3.
 
 Via SMT-Updateserver it seems that there's only Version 0.80.8 available. Are 
 there
 other package sources available (at least for Giant)?

Hi Steffen,

we have only released the client side enablement for SLES 11 SP3. There
is no Ceph server side code available for this platform (at least not
from SUSE).

Our server-side offering is based on SLES 12 (SUSE Enterprise Storage).
Currently based on firefly 0.80.9, though as always, the next upgrade is
always in the works ;-) (Probably directly going to 0.80.11.)

Only on our next product release will be based on Hammer++.

A more community-oriented version, including more recent packages, is
available for openSUSE (via build.opensuse.org).

 What I want to do is mount ceph via rbd map natively instead mounting nfs 
 from another host
 on which I have actual packages available.

That should be possible with the SLES 11 SP3 packages that you have
access to. The rbd client code is included there.


Regards,
Lars

-- 
Architect Storage/HA
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Jan Schermer
Actually looks like it stopped, but here’s a more representative sample
(notice how often it logged this!)

 v0 lc 36602135
2015-06-02 17:39:59.865833 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865834 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.865860 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865861 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.865886 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865887 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.865944 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865946 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.865989 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865992 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866025 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866027 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866072 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866074 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866121 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866123 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866164 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866166 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866205 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866207 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866244 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866246 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866285 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866287 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
2015-06-02 17:39:59.866325 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866327 
lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135



 On 02 Jun 2015, at 20:14, Jan Schermer j...@schermer.cz wrote:
 
 Our mons just went into a logging frenzy.
 
 We have 3 mons in the cluster, and they mostly log stuff like this
 
 2015-06-02 18:00:48.749386 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
 active c 36603331..36604063) is_readable now=2015-06-02 18:00:48.749389 
 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
 2015-06-02 18:00:49.025179 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
 active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.025187 
 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
 2015-06-02 18:00:49.025640 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
 active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.025642 
 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
 2015-06-02 18:00:49.026132 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
 active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.026134 
 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
 2015-06-02 18:00:49.028388 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
 active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.028393 
 lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
 
 
 There are few lines every second, sometimes more, sometimes less (tell me if 
 that’s normal. I’m not sure)
 
 Two of them went completely haywire, one log is 17GB now and rising. It’s 
 still mostly the same content, just more frequent:
 
 2015-06-02 18:09:00.879950 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.879956 
 lease_expire=0.00 has v0 lc 36604772
 2015-06-02 18:09:00.879968 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.879969 
 lease_expire=0.00 has v0 lc 36604772
 2015-06-02 18:09:00.954835 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.954843 
 lease_expire=0.00 has v0 lc 36604772
 2015-06-02 18:09:00.954860 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 updating c 

Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Somnath Roy
I think with the latest version of code it is printing only for log level 5, 
earlier it was 1. Here is the link where I had some conversation about this 
earlier with Sage.

http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881

So, IMO nothing to worry about other than log spam here which is fixed in the 
latest build or you can fix it with debug mon = 0/0

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan 
Schermer
Sent: Tuesday, June 02, 2015 11:33 AM
To: ceph-users
Subject: Re: [ceph-users] ceph-mon logging like crazy because?

Another follow-up.
The whole madness started with “mon compact” which we run from cron (else 
leveldb eats all space). It’s been running for about 14 days now with no 
incident.

2015-06-02 16:40:01.624804 7f4309d45700  0 mon.node-14@2(peon) e3 
handle_command mon_command({prefix: compact} v 0) v1
2015-06-02 16:40:23.646514 7f430a746700  1 mon.node-14@2(peon).paxos(paxos 
updating c 36596805..36597321) lease_timeout -- calling new election
2015-06-02 16:40:23.646947 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646947 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646953 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646954 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646960 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646961 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646963 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646964 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646968 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646969 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646971 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646972 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646976 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646977 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646979 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646980 
lease_expire=0.00 has v0 lc 3659
7321


The sequence that follows is
probing recovering
electing recovering
peon recovering
peon active (and this is the madness)

It logs much less now, but the issue is still here…

Jan

 On 02 Jun 2015, at 20:17, Jan Schermer j...@schermer.cz wrote:

 Actually looks like it stopped, but here’s a more representative
 sample (notice how often it logged this!)

 v0 lc 36602135
 2015-06-02 17:39:59.865833 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865860 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865886 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865944 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865989 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866025 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.866027 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866072 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.866074 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866121 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.866123 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866164 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.866166 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866205 7f4309d45700  1
 

Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Jan Schermer
Dumpling

ceph-0.67.9-16.g69a99e6

I guess it shouldn’t be logging it at all?


Thanks
Jan


 On 02 Jun 2015, at 20:42, Somnath Roy somnath@sandisk.com wrote:
 
 Which code base are you using ?
 
 -Original Message-
 From: Jan Schermer [mailto:j...@schermer.cz] 
 Sent: Tuesday, June 02, 2015 11:41 AM
 To: Somnath Roy
 Cc: ceph-users
 Subject: Re: [ceph-users] ceph-mon logging like crazy because?
 
 We actually have
 debug mon = 0”
 
 It was always spammy, but this is too spammy - on one mon the log size is 
 500MB since morning. on other node it’s 17GB and about 16.5GB of that is 
 within one hour - something’s not right there and this is likely just a 
 symptom…
 
 Jan
 
 
 On 02 Jun 2015, at 20:36, Somnath Roy somnath@sandisk.com wrote:
 
 I think with the latest version of code it is printing only for log level 5, 
 earlier it was 1. Here is the link where I had some conversation about this 
 earlier with Sage.
 
 http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881
 
 So, IMO nothing to worry about other than log spam here which is fixed 
 in the latest build or you can fix it with debug mon = 0/0
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
 Of Jan Schermer
 Sent: Tuesday, June 02, 2015 11:33 AM
 To: ceph-users
 Subject: Re: [ceph-users] ceph-mon logging like crazy because?
 
 Another follow-up.
 The whole madness started with “mon compact” which we run from cron (else 
 leveldb eats all space). It’s been running for about 14 days now with no 
 incident.
 
 2015-06-02 16:40:01.624804 7f4309d45700  0 mon.node-14@2(peon) e3 
 handle_command mon_command({prefix: compact} v 0) v1
 2015-06-02 16:40:23.646514 7f430a746700  1 
 mon.node-14@2(peon).paxos(paxos updating c 36596805..36597321) 
 lease_timeout -- calling new election
 2015-06-02 16:40:23.646947 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646947 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646953 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646954 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646960 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646961 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646963 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646964 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646968 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646969 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646971 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646972 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646976 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646977 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646979 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646980 lease_expire=0.00 has 
 v0 lc 3659
 7321
 
 
 The sequence that follows is
 probing recovering
 electing recovering
 peon recovering
 peon active (and this is the madness)
 
 It logs much less now, but the issue is still here…
 
 Jan
 
 On 02 Jun 2015, at 20:17, Jan Schermer j...@schermer.cz wrote:
 
 Actually looks like it stopped, but here’s a more representative 
 sample (notice how often it logged this!)
 
 v0 lc 36602135
 2015-06-02 17:39:59.865833 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865860 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865886 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865944 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865989 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866025 

Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Jan Schermer
Another follow-up.
The whole madness started with “mon compact” which we run from cron (else 
leveldb eats all space). It’s been running for about 14 days now with no 
incident.

2015-06-02 16:40:01.624804 7f4309d45700  0 mon.node-14@2(peon) e3 
handle_command mon_command({prefix: compact} v 0) v1
2015-06-02 16:40:23.646514 7f430a746700  1 mon.node-14@2(peon).paxos(paxos 
updating c 36596805..36597321) lease_timeout -- calling new election
2015-06-02 16:40:23.646947 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646947 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646953 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646954 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646960 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646961 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646963 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646964 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646968 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646969 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646971 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646972 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646976 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646977 
lease_expire=0.00 has v0 lc 3659
7321
2015-06-02 16:40:23.646979 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646980 
lease_expire=0.00 has v0 lc 3659
7321


The sequence that follows is
probing recovering
electing recovering
peon recovering
peon active (and this is the madness)

It logs much less now, but the issue is still here…

Jan

 On 02 Jun 2015, at 20:17, Jan Schermer j...@schermer.cz wrote:
 
 Actually looks like it stopped, but here’s a more representative sample
 (notice how often it logged this!)
 
 v0 lc 36602135
 2015-06-02 17:39:59.865833 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865834 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865860 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865861 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865886 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865887 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865944 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865946 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865989 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.865992 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866025 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866027 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866072 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866074 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866121 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866123 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866164 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866166 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866205 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866207 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866244 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866246 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866285 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
 active c 36601574..36602135) is_readable now=2015-06-02 17:39:59.866287 
 lease_expire=2015-06-02 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866325 7f4309d45700  1 

Re: [ceph-users] PG size distribution

2015-06-02 Thread Daniel Maraio

Hello,

  Thank you for the feedback Jan, much appreciated! I wont post the 
whole tree as it is rather long, but here is an example of one of our 
hosts. All of the OSDs and hosts are weighted the same, with the 
exception of a host that is missing an OSD due to a broken backplane. We 
are only using hosts for buckets so no rack/DC. We have not manually 
adjusted the crush map at all for this cluster.


 -1 302.26959 root default
-24  14.47998 host osd23
192   1.81000 osd.192  up  1.0  1.0
193   1.81000 osd.193  up  1.0  1.0
194   1.81000 osd.194  up  1.0  1.0
195   1.81000 osd.195  up  1.0  1.0
199   1.81000 osd.199  up  1.0  1.0
200   1.81000 osd.200  up  1.0  1.0
201   1.81000 osd.201  up  1.0  1.0
202   1.81000 osd.202  up  1.0  1.0

  I appreciate your input and will likely follow the same path you 
have, slowly increasing the PGs and adjusting the weights as necessary. 
If anyone else has any further suggestions I'd love to hear them as well!


- Daniel


On 06/02/2015 01:33 PM, Jan Schermer wrote:

Post the output from your “ceph osd tree”.
We were in a similiar situation, some of the OSDs were quite full while other had 
50% free. This is exactly why we increased the number of PGs, and it helped to 
some degree.
Are all your hosts the same size? Does your CRUSH map select a host in the end? 
That way if you have few hosts with differing number of OSDs the distribution 
will be poor (IMHO).

Anyway, when we started increasing the PG numbers we first generated the PGs 
themselves (pg_num) in small increments since that put a lot of load on the 
OSDs and we were seeing slow requests with large increases.
So something like this:
for i in `seq 4096 64 8192` ; do ceph osd pool set poolname pg_num $i ; done
This ate a few gigs from the drives (1-2GB if I remember correctly).

Once that was finished we increased the pgp_num in larger and larger increments 
 - at first 64 at a time and then 512 at a time when we were reaching the 
target (16384 in our case). This does allocate more space temporarily, and it 
seems to just randomly move data around - one minute an OSD is fine, another 
and the OSD is nearing full. One of us basically had to watch the process all 
the time, reweighting the devices that were almost full.
With increasing number of PGs it became much simpler, as the overhead was 
smaller, every bit of work was smaller and all the management operations a lot 
smoother.

YMMV - our data distribution was poor from the start, hosts had differing 
weights due to differing number of OSDs, there were some historical remnants 
when we tried to load-balance the data by hand, and we ended in a much better 
state but not perfect - some OSDs still have much more free space than other.
We haven’t touched the CRUSH map at all during this process, once we do and set 
newer tunables then the data distribution should be much more even.

I’d love to hear the others’ input since we are not sure why exactly this 
problem is present at all - I’d expect it to fill all the OSDs to the same or 
close-enough level, but in reality we have OSDs with weight 1.0 which are 
almost empty and others with weight 0.5 which are nearly full… When adding data 
it seems to (subjectively) distribute them evenly...

Jan


On 02 Jun 2015, at 18:52, Daniel Maraio dmar...@choopa.com wrote:

Hello,

  I have some questions about the size of my placement groups and how I can get 
a more even distribution. We currently have 160 2TB OSDs across 20 chassis.  We 
have 133TB used in our radosgw pool with a replica size of 2. We want to move 
to 3 replicas but are concerned we may fill up some of our OSDs. Some OSDs have 
~1.1TB free while others only have ~600GB free. The radosgw pool has 4096 pgs, 
looking at the documentation I probably want to increase this up to 8192, but 
we have decided to hold off on that for now.

  So, now for the pg usage. I dumped out the PG stats and noticed that there 
are two groups of PG sizes in my cluster. There are about 1024 PGs that are 
each around 17-18GB in size. The rest of the PGs are all around 34-36GB in 
size. Any idea why there are two distinct groups? We only have the one pool 
with data in it, though there are several different buckets in the radosgw 
pool. The data in the pool ranges from small images to 4-6mb audio files. Will 
increasing the number of PGs on this pool provide a more even distribution?

  Another thing to note is that the initial cluster was built lopsided, with 
some 4TB OSDs and some 2TB, we have removed all the 4TB disks and are only 
using 2TBs across the entire cluster. Not sure if this would have had any 
impact.

  Thank you for your time and I would appreciate any insight the community can 
offer.

- Daniel

[ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Jan Schermer
Our mons just went into a logging frenzy.

We have 3 mons in the cluster, and they mostly log stuff like this

2015-06-02 18:00:48.749386 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
active c 36603331..36604063) is_readable now=2015-06-02 18:00:48.749389 
lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
2015-06-02 18:00:49.025179 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.025187 
lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
2015-06-02 18:00:49.025640 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.025642 
lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
2015-06-02 18:00:49.026132 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.026134 
lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063
2015-06-02 18:00:49.028388 7f1c08c0d700  1 mon.node-10@1(peon).paxos(paxos 
active c 36603331..36604063) is_readable now=2015-06-02 18:00:49.028393 
lease_expire=2015-06-02 18:00:53.507837 has v0 lc 36604063


There are few lines every second, sometimes more, sometimes less (tell me if 
that’s normal. I’m not sure)

Two of them went completely haywire, one log is 17GB now and rising. It’s still 
mostly the same content, just more frequent:

2015-06-02 18:09:00.879950 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.879956 
lease_expire=0.00 has v0 lc 36604772
2015-06-02 18:09:00.879968 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.879969 
lease_expire=0.00 has v0 lc 36604772
2015-06-02 18:09:00.954835 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.954843 
lease_expire=0.00 has v0 lc 36604772
2015-06-02 18:09:00.954860 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
updating c 36604084..36604772) is_readable now=2015-06-02 18:09:00.954861 
lease_expire=0.00 has v0 lc 36604772
2015-06-02 18:09:01.249648 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249668 
lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773
2015-06-02 18:09:01.249697 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249699 
lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773
2015-06-02 18:09:01.249708 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249709 
lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773
2015-06-02 18:09:01.249736 7f4309d45700  1 mon.node-14@2(peon).paxos(paxos 
active c 36604084..36604773) is_readable now=2015-06-02 18:09:01.249736 
lease_expire=2015-06-02 18:09:06.091738 has v0 lc 36604773


Any idea what it might be? Clocks look synced, no other aparent problem that I 
can see, the cluster is working.
I’d like to know why this happened before I restart the unhealthy mons which (I 
hope) will fix this.

Thanks

Jan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Somnath Roy
Which code base are you using ?

-Original Message-
From: Jan Schermer [mailto:j...@schermer.cz] 
Sent: Tuesday, June 02, 2015 11:41 AM
To: Somnath Roy
Cc: ceph-users
Subject: Re: [ceph-users] ceph-mon logging like crazy because?

We actually have
debug mon = 0”

It was always spammy, but this is too spammy - on one mon the log size is 500MB 
since morning. on other node it’s 17GB and about 16.5GB of that is within one 
hour - something’s not right there and this is likely just a symptom…

Jan


On 02 Jun 2015, at 20:36, Somnath Roy somnath@sandisk.com wrote:
 
 I think with the latest version of code it is printing only for log level 5, 
 earlier it was 1. Here is the link where I had some conversation about this 
 earlier with Sage.
 
 http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881
 
 So, IMO nothing to worry about other than log spam here which is fixed 
 in the latest build or you can fix it with debug mon = 0/0
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
 Of Jan Schermer
 Sent: Tuesday, June 02, 2015 11:33 AM
 To: ceph-users
 Subject: Re: [ceph-users] ceph-mon logging like crazy because?
 
 Another follow-up.
 The whole madness started with “mon compact” which we run from cron (else 
 leveldb eats all space). It’s been running for about 14 days now with no 
 incident.
 
 2015-06-02 16:40:01.624804 7f4309d45700  0 mon.node-14@2(peon) e3 
 handle_command mon_command({prefix: compact} v 0) v1
 2015-06-02 16:40:23.646514 7f430a746700  1 
 mon.node-14@2(peon).paxos(paxos updating c 36596805..36597321) 
 lease_timeout -- calling new election
 2015-06-02 16:40:23.646947 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646947 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646953 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646954 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646960 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646961 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646963 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646964 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646968 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646969 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646971 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646972 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646976 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646977 lease_expire=0.00 has 
 v0 lc 3659
 7321
 2015-06-02 16:40:23.646979 7f4309d45700  1 
 mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
 is_readable now=2015-06-02 16:40:23.646980 lease_expire=0.00 has 
 v0 lc 3659
 7321
 
 
 The sequence that follows is
 probing recovering
 electing recovering
 peon recovering
 peon active (and this is the madness)
 
 It logs much less now, but the issue is still here…
 
 Jan
 
 On 02 Jun 2015, at 20:17, Jan Schermer j...@schermer.cz wrote:
 
 Actually looks like it stopped, but here’s a more representative 
 sample (notice how often it logged this!)
 
 v0 lc 36602135
 2015-06-02 17:39:59.865833 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865860 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865886 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865944 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865989 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866025 7f4309d45700  1 
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
 is_readable now=2015-06-02 17:39:59.866027 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 

[ceph-users] Error while installing ceph built from source

2015-06-02 Thread Aakanksha Pudipeddi-SSI
Hello,

I am trying to deploy a ceph cluster by compile from sources and I get an error 
with these messages:

dpkg: dependency problems prevent configuration of ceph:
 ceph depends on ceph-common (= 9.0.0-943); however:
  Version of ceph-common on system is 9.0.0-1.
 ceph-common (9.0.0-1) breaks ceph ( 9.0.0-943) and is unpacked but not 
configured.
  Version of ceph to be configured is 9.0.0-1.
.
.
.
.
Errors were encountered while processing:
 ceph
 ceph-dbg
 ceph-mds
 ceph-mds-dbg
 ceph-resource-agents


I followed the steps from the documentation to build Ceph packages from source:
1. ./autogen.sh
2. ./configure
3. make -j6
4. sudo dpkg-buildpackage


Now I am trying to deploy using the same procedure mentioned in ceph-deploy 
(with the exception of the ceph-deploy install ceph step):
1. ceph-deploy new hostname
2. sudo dpkg -i * (in the folder containing the .deb files)

At this step I get the error that is pasted above. I was able to follow the 
same procedure without any issues on 2 other machines but I am not able to 
identify the root cause. Any help from the community is appreciated!

Thanks,
Aakanksha
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mon logging like crazy because....?

2015-06-02 Thread Jan Schermer
We actually have
debug mon = 0”

It was always spammy, but this is too spammy - on one mon the log size is 500MB 
since morning. on other node it’s 17GB and about 16.5GB of that is within one 
hour - something’s not right there and this is likely just a symptom…

Jan


On 02 Jun 2015, at 20:36, Somnath Roy somnath@sandisk.com wrote:
 
 I think with the latest version of code it is printing only for log level 5, 
 earlier it was 1. Here is the link where I had some conversation about this 
 earlier with Sage.
 
 http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881
 
 So, IMO nothing to worry about other than log spam here which is fixed in the 
 latest build or you can fix it with debug mon = 0/0
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan 
 Schermer
 Sent: Tuesday, June 02, 2015 11:33 AM
 To: ceph-users
 Subject: Re: [ceph-users] ceph-mon logging like crazy because?
 
 Another follow-up.
 The whole madness started with “mon compact” which we run from cron (else 
 leveldb eats all space). It’s been running for about 14 days now with no 
 incident.
 
 2015-06-02 16:40:01.624804 7f4309d45700  0 mon.node-14@2(peon) e3 
 handle_command mon_command({prefix: compact} v 0) v1
 2015-06-02 16:40:23.646514 7f430a746700  1 mon.node-14@2(peon).paxos(paxos 
 updating c 36596805..36597321) lease_timeout -- calling new election
 2015-06-02 16:40:23.646947 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
 recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646947 
 lease_expire=0.00 has v0 lc 3659
 7321
 2015-06-02 16:40:23.646953 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
 recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646954 
 lease_expire=0.00 has v0 lc 3659
 7321
 2015-06-02 16:40:23.646960 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
 recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646961 
 lease_expire=0.00 has v0 lc 3659
 7321
 2015-06-02 16:40:23.646963 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
 recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646964 
 lease_expire=0.00 has v0 lc 3659
 7321
 2015-06-02 16:40:23.646968 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
 recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646969 
 lease_expire=0.00 has v0 lc 3659
 7321
 2015-06-02 16:40:23.646971 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
 recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646972 
 lease_expire=0.00 has v0 lc 3659
 7321
 2015-06-02 16:40:23.646976 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
 recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646977 
 lease_expire=0.00 has v0 lc 3659
 7321
 2015-06-02 16:40:23.646979 7f4309d45700  1 mon.node-14@2(probing).paxos(paxos 
 recovering c 36596805..36597321) is_readable now=2015-06-02 16:40:23.646980 
 lease_expire=0.00 has v0 lc 3659
 7321
 
 
 The sequence that follows is
 probing recovering
 electing recovering
 peon recovering
 peon active (and this is the madness)
 
 It logs much less now, but the issue is still here…
 
 Jan
 
 On 02 Jun 2015, at 20:17, Jan Schermer j...@schermer.cz wrote:
 
 Actually looks like it stopped, but here’s a more representative
 sample (notice how often it logged this!)
 
 v0 lc 36602135
 2015-06-02 17:39:59.865833 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865860 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865886 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865944 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.865989 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866025 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.866027 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866072 7f4309d45700  1
 mon.node-14@2(peon).paxos(paxos active c 36601574..36602135)
 is_readable now=2015-06-02 17:39:59.866074 lease_expire=2015-06-02
 17:40:04.221316 has
 v0 lc 36602135
 2015-06-02 17:39:59.866121 7f4309d45700  1
 

Re: [ceph-users] Error while installing ceph built from source

2015-06-02 Thread Somnath Roy
You need to run ceph-deploy purge first..Even after that if you see those old 
packages are still there, you need to manually remove those before installation.

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Aakanksha Pudipeddi-SSI
Sent: Tuesday, June 02, 2015 12:45 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Error while installing ceph built from source

Hello,

I am trying to deploy a ceph cluster by compile from sources and I get an error 
with these messages:

dpkg: dependency problems prevent configuration of ceph:
 ceph depends on ceph-common (= 9.0.0-943); however:
  Version of ceph-common on system is 9.0.0-1.
 ceph-common (9.0.0-1) breaks ceph ( 9.0.0-943) and is unpacked but not 
configured.
  Version of ceph to be configured is 9.0.0-1.
.
.
.
.
Errors were encountered while processing:
 ceph
 ceph-dbg
 ceph-mds
 ceph-mds-dbg
 ceph-resource-agents


I followed the steps from the documentation to build Ceph packages from source:
1. ./autogen.sh
2. ./configure
3. make -j6
4. sudo dpkg-buildpackage


Now I am trying to deploy using the same procedure mentioned in ceph-deploy 
(with the exception of the ceph-deploy install ceph step):
1. ceph-deploy new hostname
2. sudo dpkg -i * (in the folder containing the .deb files)

At this step I get the error that is pasted above. I was able to follow the 
same procedure without any issues on 2 other machines but I am not able to 
identify the root cause. Any help from the community is appreciated!

Thanks,
Aakanksha
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Cameron . Scrace
Thanks for the links, Jumbo frames are definitely working. Although we had 
to set the MTU to 8192 because one of the components doesn't support an 
MTU higher than that. 

Thanks for the help. Looks like we may just have to deal with jumbo frames 
being off.

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:   Somnath Roy somnath@sandisk.com
To: cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz
Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com, 
ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis 
j...@suse.de
Date:   03/06/2015 11:49 a.m.
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic)



I doubt it is anything to do with Ceph, hope you checked your switch is 
supporting Jumbo frames and you have set MTU 9000 to all the devices in 
between. It‘s better to ping your devices (all the devices participating 
in the cluster) like the way it mentioned in the following articles , just 
in case you are not sure.
 
http://www.mylesgray.com/hardware/test-jumbo-frames-working/
http://serverfault.com/questions/234311/testing-whether-jumbo-frames-are-actually-working
 
Hope this helps,
 
Thanks  Regards
Somnath
 
From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] 
Sent: Tuesday, June 02, 2015 4:32 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)
 
Setting the MTU to 1500 worked, monitors reach quorum right away. 
Unfortunately we really want Jumbo Frames to be on, any ideas on how to 
get ceph to work with them on? 

Thanks! 

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz 



From:Somnath Roy somnath@sandisk.com 
To:cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz 
Cc:ceph-users@lists.ceph.com ceph-users@lists.ceph.com, 
ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis 
j...@suse.de 
Date:03/06/2015 10:34 a.m. 
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic) 




We have seen some communication issue with that, try to make all the 
server MTU 1500 and try out… 
  
From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] 
Sent: Tuesday, June 02, 2015 3:31 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic) 
  
We are running with Jumbo Frames turned on. Is that likely to be the 
issue? Do I need to configure something in ceph? 

The mon maps are fine and after setting debug to 10 and debug ms to 1, I 
see probe timeouts in the logs: http://pastebin.com/44M1uJZc 
I just set probe timeout to 10 (up from 2) and it still times out. 

Thanks! 

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz 



From:Somnath Roy somnath@sandisk.com 
To:Joao Eduardo Luis j...@suse.de, ceph-users@lists.ceph.com 
ceph-users@lists.ceph.com 
Date:03/06/2015 03:49 a.m. 
Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic) 
Sent by:ceph-users ceph-users-boun...@lists.ceph.com 





By any chance are you running with jumbo frame turned on ?

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Joao Eduardo Luis
Sent: Tuesday, June 02, 2015 12:52 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)

On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote:
 I am trying to deploy a new ceph cluster and my monitors are not
 reaching quorum. SELinux is off, firewalls are off, I can see traffic
 between the nodes on port 6789 but when I use the admin socket to
 force a re-election only the monitor I send the request to shows the
 new election in its logs. My logs are filled entirely of the following
 two
 lines:

 2015-06-02 11:31:56.447975 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd='mon_status' args=[]:
 dispatch
 2015-06-02 11:31:56.448272 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' 

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Cameron . Scrace
Setting the MTU to 1500 worked, monitors reach quorum right away. 
Unfortunately we really want Jumbo Frames to be on, any ideas on how to 
get ceph to work with them on?

Thanks!

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:   Somnath Roy somnath@sandisk.com
To: cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz
Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com, 
ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis 
j...@suse.de
Date:   03/06/2015 10:34 a.m.
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic)



We have seen some communication issue with that, try to make all the 
server MTU 1500 and try out…
 
From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] 
Sent: Tuesday, June 02, 2015 3:31 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)
 
We are running with Jumbo Frames turned on. Is that likely to be the 
issue? Do I need to configure something in ceph? 

The mon maps are fine and after setting debug to 10 and debug ms to 1, I 
see probe timeouts in the logs: http://pastebin.com/44M1uJZc 
I just set probe timeout to 10 (up from 2) and it still times out. 

Thanks! 

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz 



From:Somnath Roy somnath@sandisk.com 
To:Joao Eduardo Luis j...@suse.de, ceph-users@lists.ceph.com 
ceph-users@lists.ceph.com 
Date:03/06/2015 03:49 a.m. 
Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic) 
Sent by:ceph-users ceph-users-boun...@lists.ceph.com 




By any chance are you running with jumbo frame turned on ?

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Joao Eduardo Luis
Sent: Tuesday, June 02, 2015 12:52 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)

On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote:
 I am trying to deploy a new ceph cluster and my monitors are not
 reaching quorum. SELinux is off, firewalls are off, I can see traffic
 between the nodes on port 6789 but when I use the admin socket to
 force a re-election only the monitor I send the request to shows the
 new election in its logs. My logs are filled entirely of the following
 two
 lines:

 2015-06-02 11:31:56.447975 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd='mon_status' args=[]:
 dispatch
 2015-06-02 11:31:56.448272 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd=mon_status args=[]:
 finished

You are running on default debug levels, so you'll hardly get anything 
more than that.  I suggest setting 'debug mon = 10' and 'debug ms = 1'
for added verbosity and come back to us with the logs.

There are many reasons for this, but the more common are due to the 
monitors not being able to communicate with each other.  Given you see 
traffic between the monitors, I'm inclined to assume that the other two 
monitors do not have each other on the monmap or, if they do know each 
other, either 1) the monitor's auth keys do not match, or 2) the probe 
timeout is being triggered before they successfully manage to find enough 
monitors to trigger an election -- which may be due to latency.

Logs will tells us more.

 -Joao

 Querying the admin socket with mon_status (the other two are the
 similar but with their hostnames and rank):

 {
 name: wcm1,
 rank: 0,
 state: probing,
 election_epoch: 1,
 quorum: [],
 outside_quorum: [
 wcm1
 ],
 extra_probe_peers: [],
 sync_provider: [],
 monmap: {
 epoch: 0,
 fsid: adb8c500-122e-49fd-9c1e-a99af7832307,
 modified: 2015-06-02 10:43:41.467811,
 created: 2015-06-02 10:43:41.467811,
 mons: [
 {
 rank: 0,
 name: wcm1,
 addr: 10.1.226.64:6789\/0
 },
 {
 rank: 1,
 name: wcm2,
 addr: 10.1.226.65:6789\/0
 },
 {
 rank: 2,
 name: wcm3,
 addr: 10.1.226.66:6789\/0
 }
 ]
 }
 }

___
ceph-users mailing list

Re: [ceph-users] Synchronous writes - tuning and some thoughts about them?

2015-06-02 Thread Josh Durgin

On 06/01/2015 03:41 AM, Jan Schermer wrote:

Thanks, that’s it exactly.
But I think that’s really too much work for now, that’s why I really would like 
to see a quick-win by using the local RBD cache for now - that would suffice 
for most workloads (not too many people run big databases on CEPH now, those 
who do must be aware of this).

The issue is - and I have not yet seen an answer to that - would it be safe as 
it is now if the flushes were ignored (rbd cache = unsafe) or will it 
completely b0rk the filesystem when not flushed properly?


Generally the latter. Right now flushes are the only thing enforcing
ordering for rbd. As a block device it doesn't guarantee that e.g. the
extent at offset 0 is written before the extent at offset 4096 unless
it sees a flush between the writes.

As suggested earlier in this thread, maintaining order during writeback
would make not sending flushes (via mount -o nobarrier in the guest or
cache=unsafe for qemu) safer from a crash-consistency point of view.

An fs or database on top of rbd would still have to replay their
internal journal, and could lose some writes, but should be able to
end up in a consistent state that way. This would make larger caches
more useful, and would be a simple way to use a large local cache
devices as an rbd cache backend. Live migration should still work in
such a system because qemu will still tell rbd to flush data at that
point.

A distributed local cache like [1] might be better long term, but
much more complicated to implement.

Josh

[1] 
https://www.usenix.org/conference/fast15/technical-sessions/presentation/bhagwat


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Somnath Roy
I doubt it is anything to do with Ceph, hope you checked your switch is 
supporting Jumbo frames and you have set MTU 9000 to all the devices in 
between. It‘s better to ping your devices (all the devices participating in the 
cluster) like the way it mentioned in the following articles , just in case you 
are not sure.

http://www.mylesgray.com/hardware/test-jumbo-frames-working/
http://serverfault.com/questions/234311/testing-whether-jumbo-frames-are-actually-working

Hope this helps,

Thanks  Regards
Somnath

From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz]
Sent: Tuesday, June 02, 2015 4:32 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables 
off, can see tcp traffic)

Setting the MTU to 1500 worked, monitors reach quorum right away. Unfortunately 
we really want Jumbo Frames to be on, any ideas on how to get ceph to work with 
them on?

Thanks!

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085
Email  cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:Somnath Roy 
somnath@sandisk.commailto:somnath@sandisk.com
To:cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz 
cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz
Cc:ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com, ceph-users 
ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com, 
Joao Eduardo Luis j...@suse.demailto:j...@suse.de
Date:03/06/2015 10:34 a.m.
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)




We have seen some communication issue with that, try to make all the server MTU 
1500 and try out…

From: cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz 
[mailto:cameron.scr...@solnet.co.nz]
Sent: Tuesday, June 02, 2015 3:31 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com; ceph-users; 
Joao Eduardo Luis
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables 
off, can see tcp traffic)

We are running with Jumbo Frames turned on. Is that likely to be the issue? Do 
I need to configure something in ceph?

The mon maps are fine and after setting debug to 10 and debug ms to 1, I see 
probe timeouts in the logs: http://pastebin.com/44M1uJZc
I just set probe timeout to 10 (up from 2) and it still times out.

Thanks!

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085
Email  cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:Somnath Roy 
somnath@sandisk.commailto:somnath@sandisk.com
To:Joao Eduardo Luis j...@suse.demailto:j...@suse.de, 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Date:03/06/2015 03:49 a.m.
Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)
Sent by:ceph-users 
ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com





By any chance are you running with jumbo frame turned on ?

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joao 
Eduardo Luis
Sent: Tuesday, June 02, 2015 12:52 AM
To: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables 
off, can see tcp traffic)

On 06/02/2015 01:42 AM, 
cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz wrote:
 I am trying to deploy a new ceph cluster and my monitors are not
 reaching quorum. SELinux is off, firewalls are off, I can see traffic
 between the nodes on port 6789 but when I use the admin socket to
 force a re-election only the monitor I send the request to shows the
 new election in its logs. My logs are filled entirely of the following
 two
 lines:

 2015-06-02 11:31:56.447975 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd='mon_status' args=[]:
 dispatch
 2015-06-02 11:31:56.448272 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd=mon_status args=[]:
 finished

You are running on default debug levels, so you'll hardly get anything more 
than that.  I suggest setting 'debug mon = 10' and 'debug ms = 1'
for added verbosity and come back to us with the logs.

There are many reasons for this, but the more common are due 

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Somnath Roy
We have seen some communication issue with that, try to make all the server MTU 
1500 and try out...

From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz]
Sent: Tuesday, June 02, 2015 3:31 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables 
off, can see tcp traffic)

We are running with Jumbo Frames turned on. Is that likely to be the issue? Do 
I need to configure something in ceph?

The mon maps are fine and after setting debug to 10 and debug ms to 1, I see 
probe timeouts in the logs: http://pastebin.com/44M1uJZc
I just set probe timeout to 10 (up from 2) and it still times out.

Thanks!

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085
Email  cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:Somnath Roy 
somnath@sandisk.commailto:somnath@sandisk.com
To:Joao Eduardo Luis j...@suse.demailto:j...@suse.de, 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Date:03/06/2015 03:49 a.m.
Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)
Sent by:ceph-users 
ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com




By any chance are you running with jumbo frame turned on ?

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joao 
Eduardo Luis
Sent: Tuesday, June 02, 2015 12:52 AM
To: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables 
off, can see tcp traffic)

On 06/02/2015 01:42 AM, 
cameron.scr...@solnet.co.nzmailto:cameron.scr...@solnet.co.nz wrote:
 I am trying to deploy a new ceph cluster and my monitors are not
 reaching quorum. SELinux is off, firewalls are off, I can see traffic
 between the nodes on port 6789 but when I use the admin socket to
 force a re-election only the monitor I send the request to shows the
 new election in its logs. My logs are filled entirely of the following
 two
 lines:

 2015-06-02 11:31:56.447975 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd='mon_status' args=[]:
 dispatch
 2015-06-02 11:31:56.448272 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd=mon_status args=[]:
 finished

You are running on default debug levels, so you'll hardly get anything more 
than that.  I suggest setting 'debug mon = 10' and 'debug ms = 1'
for added verbosity and come back to us with the logs.

There are many reasons for this, but the more common are due to the monitors 
not being able to communicate with each other.  Given you see traffic between 
the monitors, I'm inclined to assume that the other two monitors do not have 
each other on the monmap or, if they do know each other, either 1) the 
monitor's auth keys do not match, or 2) the probe timeout is being triggered 
before they successfully manage to find enough monitors to trigger an election 
-- which may be due to latency.

Logs will tells us more.

 -Joao

 Querying the admin socket with mon_status (the other two are the
 similar but with their hostnames and rank):

 {
 name: wcm1,
 rank: 0,
 state: probing,
 election_epoch: 1,
 quorum: [],
 outside_quorum: [
 wcm1
 ],
 extra_probe_peers: [],
 sync_provider: [],
 monmap: {
 epoch: 0,
 fsid: adb8c500-122e-49fd-9c1e-a99af7832307,
 modified: 2015-06-02 10:43:41.467811,
 created: 2015-06-02 10:43:41.467811,
 mons: [
 {
 rank: 0,
 name: wcm1,
 addr: 10.1.226.64:6789\/0
 },
 {
 rank: 1,
 name: wcm2,
 addr: 10.1.226.65:6789\/0
 },
 {
 rank: 2,
 name: wcm3,
 addr: 10.1.226.66:6789\/0
 }
 ]
 }
 }

___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this 

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Nigel Williams
On Wed, Jun 3, 2015 at 8:30 AM,  cameron.scr...@solnet.co.nz wrote:
 We are running with Jumbo Frames turned on. Is that likely to be the issue?

I got caught by this previously:

http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-October/043955.html

The problem is Ceph almost-but-not-quite works, leading you down
lots of fruitless paths.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Cameron . Scrace
We are running with Jumbo Frames turned on. Is that likely to be the 
issue? Do I need to configure something in ceph?

The mon maps are fine and after setting debug to 10 and debug ms to 1, I 
see probe timeouts in the logs: http://pastebin.com/44M1uJZc
I just set probe timeout to 10 (up from 2) and it still times out.

Thanks!

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:   Somnath Roy somnath@sandisk.com
To: Joao Eduardo Luis j...@suse.de, ceph-users@lists.ceph.com 
ceph-users@lists.ceph.com
Date:   03/06/2015 03:49 a.m.
Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic)
Sent by:ceph-users ceph-users-boun...@lists.ceph.com



By any chance are you running with jumbo frame turned on ?

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Joao Eduardo Luis
Sent: Tuesday, June 02, 2015 12:52 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)

On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote:
 I am trying to deploy a new ceph cluster and my monitors are not
 reaching quorum. SELinux is off, firewalls are off, I can see traffic
 between the nodes on port 6789 but when I use the admin socket to
 force a re-election only the monitor I send the request to shows the
 new election in its logs. My logs are filled entirely of the following
 two
 lines:

 2015-06-02 11:31:56.447975 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd='mon_status' args=[]:
 dispatch
 2015-06-02 11:31:56.448272 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd=mon_status args=[]:
 finished

You are running on default debug levels, so you'll hardly get anything 
more than that.  I suggest setting 'debug mon = 10' and 'debug ms = 1'
for added verbosity and come back to us with the logs.

There are many reasons for this, but the more common are due to the 
monitors not being able to communicate with each other.  Given you see 
traffic between the monitors, I'm inclined to assume that the other two 
monitors do not have each other on the monmap or, if they do know each 
other, either 1) the monitor's auth keys do not match, or 2) the probe 
timeout is being triggered before they successfully manage to find enough 
monitors to trigger an election -- which may be due to latency.

Logs will tells us more.

  -Joao

 Querying the admin socket with mon_status (the other two are the
 similar but with their hostnames and rank):

 {
 name: wcm1,
 rank: 0,
 state: probing,
 election_epoch: 1,
 quorum: [],
 outside_quorum: [
 wcm1
 ],
 extra_probe_peers: [],
 sync_provider: [],
 monmap: {
 epoch: 0,
 fsid: adb8c500-122e-49fd-9c1e-a99af7832307,
 modified: 2015-06-02 10:43:41.467811,
 created: 2015-06-02 10:43:41.467811,
 mons: [
 {
 rank: 0,
 name: wcm1,
 addr: 10.1.226.64:6789\/0
 },
 {
 rank: 1,
 name: wcm2,
 addr: 10.1.226.65:6789\/0
 },
 {
 rank: 2,
 name: wcm3,
 addr: 10.1.226.66:6789\/0
 }
 ]
 }
 }

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If 
the reader of this message is not the intended recipient, you are hereby 
notified that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly 
prohibited. If you have received this communication in error, please 
notify the sender by telephone or e-mail (as shown above) immediately and 
destroy any and all copies of this message in your possession (whether 
hard copies or electronically stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Attention:
This email may contain information intended for the sole use of
the original recipient. Please respect this when sharing or
disclosing this email's contents with any third party. If you
believe you have received this email in error, please delete it
and notify the sender or postmas...@solnetsolutions.co.nz as
soon as possible. The 

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Cameron . Scrace
Seems to be something to do with our switch. If the interface MTU is too 
close to the switch MTU it stops working. Thanks for all your help :)

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz



From:   Somnath Roy somnath@sandisk.com
To: cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz
Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com, 
ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis 
j...@suse.de
Date:   03/06/2015 11:49 a.m.
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic)



I doubt it is anything to do with Ceph, hope you checked your switch is 
supporting Jumbo frames and you have set MTU 9000 to all the devices in 
between. It‘s better to ping your devices (all the devices participating 
in the cluster) like the way it mentioned in the following articles , just 
in case you are not sure.
 
http://www.mylesgray.com/hardware/test-jumbo-frames-working/
http://serverfault.com/questions/234311/testing-whether-jumbo-frames-are-actually-working
 
Hope this helps,
 
Thanks  Regards
Somnath
 
From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] 
Sent: Tuesday, June 02, 2015 4:32 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis
Subject: RE: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)
 
Setting the MTU to 1500 worked, monitors reach quorum right away. 
Unfortunately we really want Jumbo Frames to be on, any ideas on how to 
get ceph to work with them on? 

Thanks! 

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz 



From:Somnath Roy somnath@sandisk.com 
To:cameron.scr...@solnet.co.nz cameron.scr...@solnet.co.nz 
Cc:ceph-users@lists.ceph.com ceph-users@lists.ceph.com, 
ceph-users ceph-users-boun...@lists.ceph.com, Joao Eduardo Luis 
j...@suse.de 
Date:03/06/2015 10:34 a.m. 
Subject:RE: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic) 




We have seen some communication issue with that, try to make all the 
server MTU 1500 and try out… 
  
From: cameron.scr...@solnet.co.nz [mailto:cameron.scr...@solnet.co.nz] 
Sent: Tuesday, June 02, 2015 3:31 PM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; ceph-users; Joao Eduardo Luis
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic) 
  
We are running with Jumbo Frames turned on. Is that likely to be the 
issue? Do I need to configure something in ceph? 

The mon maps are fine and after setting debug to 10 and debug ms to 1, I 
see probe timeouts in the logs: http://pastebin.com/44M1uJZc 
I just set probe timeout to 10 (up from 2) and it still times out. 

Thanks! 

Cameron Scrace
Infrastructure Engineer

Mobile +64 22 610 4629
Phone  +64 4 462 5085 
Email  cameron.scr...@solnet.co.nz
Solnet Solutions Limited
Level 12, Solnet House
70 The Terrace, Wellington 6011
PO Box 397, Wellington 6140

www.solnet.co.nz 



From:Somnath Roy somnath@sandisk.com 
To:Joao Eduardo Luis j...@suse.de, ceph-users@lists.ceph.com 
ceph-users@lists.ceph.com 
Date:03/06/2015 03:49 a.m. 
Subject:Re: [ceph-users] Monitors not reaching quorum. (SELinux 
off, IPtables off, can see tcp traffic) 
Sent by:ceph-users ceph-users-boun...@lists.ceph.com 





By any chance are you running with jumbo frame turned on ?

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Joao Eduardo Luis
Sent: Tuesday, June 02, 2015 12:52 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, 
IPtables off, can see tcp traffic)

On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote:
 I am trying to deploy a new ceph cluster and my monitors are not
 reaching quorum. SELinux is off, firewalls are off, I can see traffic
 between the nodes on port 6789 but when I use the admin socket to
 force a re-election only the monitor I send the request to shows the
 new election in its logs. My logs are filled entirely of the following
 two
 lines:

 2015-06-02 11:31:56.447975 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd='mon_status' args=[]:
 dispatch
 2015-06-02 11:31:56.448272 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd=mon_status args=[]:
 finished

You are running on default debug levels, so you'll hardly get anything 
more than 

Re: [ceph-users] active+clean+scrubbing+deep

2015-06-02 Thread Никитенко Виталий
Thank Irek, it really worked 02.06.2015, 15:58, "Irek Fasikhov" malm...@gmail.com:Hi. Restart the OSD. :)2015-06-02 11:55 GMT+03:00 Никитенко Виталий v1...@yandex.ru:Hi!  I have ceph version 0.94.1.  root@ceph-node1:~# ceph -s     cluster 3e0d58cd-d441-4d44-b49b-6cff08c20abf      health HEALTH_OK      monmap e2: 3 mons at {ceph-mon=10.10.100.3:6789/0,ceph-node1=10.10.100.1:6789/0,ceph-node2=10.10.100.2:6789/0}             election epoch 428, quorum 0,1,2 ceph-node1,ceph-node2,ceph-mon      osdmap e978: 16 osds: 16 up, 16 in       pgmap v6735569: 2012 pgs, 8 pools, 2801 GB data, 703 kobjects             5617 GB used, 33399 GB / 39016 GB avail                 2011 active+clean                    1 active+clean+scrubbing+deep   client io 174 kB/s rd, 30641 kB/s wr, 80 op/s  root@ceph-node1:~# ceph pg dump  | grep -i deep | cut -f 1   dumped all in format plain   pg_stat   19.b3  In log file i see 2015-05-14 03:23:51.556876 7fc708a37700  0 log_channel(cluster) log [INF] : 19.b3 deep-scrub starts but no "19.b3 deep-scrub ok"  then i do "ceph pg deep-scrub 19.b3", nothing happens and in logs file no any records about it.  What can i do to pg return in "active + clean" station? is there any sense restart OSD or the entirely server where the OSD?  Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек НургаязовичМоб.: +79229045757___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph RBD and Cephfuse

2015-06-02 Thread gjprabu
Hi Team,

  We are newly using ceph with two OSD and two clients, our requirement is 
when we write date through clients it should see in another client also,  
storage is mounted using rbd because we running git clone with large amount of 
small file and it is fast when use rbd mount, but data not sync in both the 
clients. 

  Another option we used to mount Cephfuse , here i can see the data in 
both the clients but it is too slow with large amount of small file(202M) using 
git clone. Also tried NFS etc but it is slow.

  Kindly share the solution to achieve our requirements.


Git clone using RBD mount but data not sync with two clents with same image 
partition.
time git clone https://github.com/elastic/elasticsearch.git
Initialized empty Git repository in /home/sas/cide/elasticsearch/.git/
remote: Counting objects: 359724, done.
remote: Compressing objects: 100% (55/55), done.
remote: Total 359724 (delta 59), reused 20 (delta 20), pack-reused 359649
Receiving objects: 100% (359724/359724), 129.04 MiB | 8.04 MiB/s, done.
Resolving deltas: 100% (203986/203986), done.

real0m49.255s
user0m19.371s
sys0m3.762s

Git clone using cephfuse partition mount and i can see the data in both the 
clients but its take 11m time git clone 
https://github.com/elastic/elasticsearch.git

Initialized empty Git repository in /home/sas/cide1/elasticsearch/.git/

remote: Counting objects: 359724, done.

remote: Compressing objects: 100% (55/55), done.

remote: Total 359724 (delta 59), reused 20 (delta 20), pack-reused 359649

Receiving objects: 100% (359724/359724), 129.04 MiB | 473 KiB/s, done.

Resolving deltas: 100% (203986/203986), done.



real11m16.371s

user0m35.235s

sys1m59.389s



Regards
Prabu





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph RBD and Cephfuse

2015-06-02 Thread Lars Marowsky-Bree
On 2015-06-02T15:40:54, gjprabu gjpr...@zohocorp.com wrote:

 Hi Team,
 
   We are newly using ceph with two OSD and two clients, our requirement 
 is when we write date through clients it should see in another client also,  
 storage is mounted using rbd because we running git clone with large amount 
 of small file and it is fast when use rbd mount, but data not sync in both 
 the clients. 

What file system are you using on top of RBD for this purpose? To
achieve this goal, you'd need to use a cluster-aware file system (with
all the complexity that entails) like OCFS2 or GFS2.

You cannot mount something like XFS/btrfs/ext4 multiple times; that
will, in fact, corrupt your data and likely crash the client's
kernels.


Regards,
Lars

-- 
Architect Storage/HA
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Recommendations for a driver situation

2015-06-02 Thread Pontus Lindgren
Hello,

We have recently acquired new servers for a new ceph cluster and we want to run 
Debian on those servers. Unfortunately drivers needed for the raid controller 
are only available in newer kernels than what Debian Wheezy provides. 

We need to run the dumpling release of Ceph.

Since the Ceph repo does not have packages for Debian Jessie I see 3 
alternatives for us:
1. Wait for the Ceph repo to add packages for Debian Jessie.
Number 1 is not really an option for us. But, is there an approximate ETA on 
this?

2. Run Debian Wheezy with backported drivers.

3. Build the Ceph dumpling packages for Debian Jessie.
Number 3, is this possible? Cloning the master branch from git gives you the 
install_debs.sh script which can be used to build Ceph 9.0 packages(we need 
dumpling). And in the Dumpling branch there is no Debian package building script

Which one of these would you recommend?  
 
Also, will Dumpling be released for Debian Jessie?

Pontus Lindgren

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Installation Issues

2015-06-02 Thread Alexander Dacre
Hi,

I'm having some difficulty installing the Hammer release on CentOS 6.6 
following the instructions here: 
http://docs.ceph.com/docs/master/start/quick-ceph-deploy/.

The initial problem was with the install.py and uninstall.py scripts 
referencing radosgw instead of ceph-radosgw in the packages lists. Swapping 
these out enabled the installation of the radosgw packages on the cluster nodes.

However, the execution of ceph-deploy rgw create [node] fails with a no such 
file or directory error. Any suggestions? I've copied the log file below.

Thanks,

[ceph_deploy.conf][DEBUG ] found configuration file at: 
/home/ceph_admin/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.23): /usr/bin/ceph-deploy rgw create 
ceph-node01
[ceph_deploy.rgw][DEBUG ] Deploying rgw, cluster ceph hosts 
ceph-node01:rgw.ceph-node01
[ceph-node01][DEBUG ] connection detected need for sudo
[ceph-node01][DEBUG ] connected to host: ceph-node01
[ceph-node01][DEBUG ] detect platform information from remote host
[ceph-node01][DEBUG ] detect machine type
[ceph_deploy.rgw][INFO  ] Distro info: CentOS 6.6 Final
[ceph_deploy.rgw][DEBUG ] remote host will use sysvinit
[ceph_deploy.rgw][DEBUG ] deploying rgw bootstrap to ceph-node01
[ceph-node01][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph-node01][DEBUG ] create path if it doesn't exist
[ceph_deploy.rgw][ERROR ] OSError: [Errno 2] No such file or directory: 
'/var/lib/ceph/radosgw/ceph-rgw.ceph-node01'
[ceph_deploy][ERROR ] GenericError: Failed to create 1 RGWs

Alex Dacre
Systems Engineer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph RBD and Cephfuse

2015-06-02 Thread gjprabu
Hi Lars,

   We installed centos in client machines with kernel version is 3.10 
which is rbd supported modules. Now installed ocsfs2-tools and formated but 
mount through error. Please check below.


mount -t ocfs2 /dev/rbd/rbd/newinteg /home/test/cide
mount.ocfs2: Unable to access cluster service while trying initialize cluster

mkfs.ocfs2  /dev/rbd/rbd/newinteg 
mkfs.ocfs2 1.6.4
Cluster stack: classic o2cb
Label: 
Features: sparse backup-super unwritten inline-data strict-journal-super xattr
Block size: 4096 (12 bits)
Cluster size: 4096 (12 bits)
Volume size: 7340032 (1792 clusters) (1792 blocks)
Cluster groups: 556 (tail covers 17920 clusters, rest cover 32256 clusters)
Extent allocator size: 12582912 (3 groups)
Journal size: 268435456
Node slots: 8
Creating bitmaps: done
Initializing superblock: done
Writing system files: done
Writing superblock: done
Writing backup superblock: 4 block(s)
Formatting Journals: 
done
Growing extent allocator: done
Formatting slot map: done
Formatting quota files: done
Writing lost+found: done
mkfs.ocfs2 successful

 

Regards
Prabu





 On Tue, 02 Jun 2015 16:18:53 +0530 Lars Marowsky-Breelt;l...@suse.comgt; 
wrote  

On 2015-06-02T15:40:54, gjprabu lt;gjpr...@zohocorp.comgt; wrote: 
 
gt; Hi Team, 
gt; 
gt; We are newly using ceph with two OSD and two clients, our requirement is 
when we write date through clients it should see in another client also, 
storage is mounted using rbd because we running git clone with large amount of 
small file and it is fast when use rbd mount, but data not sync in both the 
clients. 
 
What file system are you using on top of RBD for this purpose? To 
achieve this goal, you'd need to use a cluster-aware file system (with 
all the complexity that entails) like OCFS2 or GFS2. 
 
You cannot mount something like XFS/btrfs/ext4 multiple times; that 
will, in fact, corrupt your data and likely crash the client's 
kernels. 
 
 
Regards, 
 Lars 
 
-- 
Architect Storage/HA 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nürnberg) 
Experience is the name everyone gives to their mistakes. -- Oscar Wilde 
 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best setup for SSD

2015-06-02 Thread Eneko Lacunza

Hi,

On 02/06/15 16:18, Mark Nelson wrote:

On 06/02/2015 09:02 AM, Phil Schwarz wrote:

Le 02/06/2015 15:33, Eneko Lacunza a écrit :

Hi,

On 02/06/15 15:26, Phil Schwarz wrote:

On 02/06/15 14:51, Phil Schwarz wrote:
i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) 
cluster.


-1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 
3X 4TB

SATA
It'll be used as OSD+Mon server only.

Are these SSDs Intel S3700 too? What amount of RAM?

Yes, All DCS3700, for the four nodes.
16GB of RAM on this node.

This should be enough for 3 OSDs I think, I used to have a Dell
T20/Intel G3230 with 2x1TB OSDs with only 4 GB running OK.

Cheers
Eneko


Yes, indeed.
My main problem is doing something non adviced...
Running VMs on Ceph nodes...
No choice, but it seems that i'll have to do that.
Hope  i won't peg the CPU too quickly..


I'm doing it in 3 different Proxmox clusters. They're not very busy 
clusters, but works very well.
You might want to consider using cgroups or some other mechanism to 
segment what runs on what cores.  While not ideal, dedicating 2-3 of 
the cores to ceph and leaving the other(s) for VMs might be a 
reasonable way to go.



I think this may be must if you setup a dedicated SSD pool.
A single DC S3700 should suffice for journals for 4 OSDs.  I wouldn't 
recommend using the other one for a cache tier unless you have a very 
highly skewed hot/cold workload.  Perhaps instead make a dedicated SSD 
pool that could be used for high IOPS workloads. In fact you might 
consider skipping SSD journals and just making a dedicated SSD pool 
with all of the SSDs depending on how much write workload your main 
pool sees and if you could make good use of a dedicated SSD pool.
Be warned that running SSD and HD based OSDs in the same server is not 
recommended. If you need the storage capacity, I'd stick to the journals 
on SSDs plan.


Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best setup for SSD

2015-06-02 Thread Phil Schwarz
Le 02/06/2015 15:33, Eneko Lacunza a écrit :
 Hi,
 
 On 02/06/15 15:26, Phil Schwarz wrote:
 On 02/06/15 14:51, Phil Schwarz wrote:
 i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) cluster.

 -1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 3X 4TB
 SATA
 It'll be used as OSD+Mon server only.
 Are these SSDs Intel S3700 too? What amount of RAM?
 Yes, All DCS3700, for the four nodes.
 16GB of RAM on this node.
 This should be enough for 3 OSDs I think, I used to have a Dell
 T20/Intel G3230 with 2x1TB OSDs with only 4 GB running OK.
 
 Cheers
 Eneko
 
Yes, indeed.
My main problem is doing something non adviced...
Running VMs on Ceph nodes...
No choice, but it seems that i'll have to do that.
Hope  i won't peg the CPU too quickly..
Best regards

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] active+clean+scrubbing+deep

2015-06-02 Thread Luis Periquito
that's a normal process running...

for more information
http://ceph.com/docs/master/rados/configuration/osd-config-ref/#scrubbing

On Tue, Jun 2, 2015 at 9:55 AM, Никитенко Виталий v1...@yandex.ru wrote:

 Hi!

 I have ceph version 0.94.1.

 root@ceph-node1:~# ceph -s
 cluster 3e0d58cd-d441-4d44-b49b-6cff08c20abf
  health HEALTH_OK
  monmap e2: 3 mons at {ceph-mon=
 10.10.100.3:6789/0,ceph-node1=10.10.100.1:6789/0,ceph-node2=10.10.100.2:6789/0
 }
 election epoch 428, quorum 0,1,2 ceph-node1,ceph-node2,ceph-mon
  osdmap e978: 16 osds: 16 up, 16 in
   pgmap v6735569: 2012 pgs, 8 pools, 2801 GB data, 703 kobjects
 5617 GB used, 33399 GB / 39016 GB avail
 2011 active+clean
1 active+clean+scrubbing+deep
   client io 174 kB/s rd, 30641 kB/s wr, 80 op/s

 root@ceph-node1:~# ceph pg dump  | grep -i deep | cut -f 1
   dumped all in format plain
   pg_stat
   19.b3

 In log file i see
 2015-05-14 03:23:51.556876 7fc708a37700  0 log_channel(cluster) log [INF]
 : 19.b3 deep-scrub starts
 but no 19.b3 deep-scrub ok

 then i do ceph pg deep-scrub 19.b3, nothing happens and in logs file no
 any records about it.

 What can i do to pg return in active + clean station?
 is there any sense restart OSD or the entirely server where the OSD?

 Thanks.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What do internal_safe_to_start_threads and leveldb_compression do?

2015-06-02 Thread Gregory Farnum
On Tue, Jun 2, 2015 at 6:47 AM, Erik Logtenberg e...@logtenberg.eu wrote:
 What does this do?

 - leveldb_compression: false (default: true)
 - leveldb_block/cache/write_buffer_size (all bigger than default)

 I take it you're running these commands on a monitor (from I think the
 Dumpling timeframe, or maybe even Firefly)? These are hitting specific
 settings in LevelDB which we tune differently for the monitor and OSD,
 but which were shared config options in older releases. They have
 their own settings in newer code.
 -Greg


 You are correct. I started out with Firefly and gradually upgraded the
 cluster as new releases came out. I am on Hammer (0.94.1) now.

 The current settings are different from the default. Does this mean
 that the settings are still Firefly-like and should be changed to the
 new default; or does this mean that the defaults are still Firefly-like
 but the settings are actually Hammer-style ;) and thus right.

Hmm, I think you must be setting them in your config file for them to
be different now, but I don't really remember...Joao? :)
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best setup for SSD

2015-06-02 Thread Mark Nelson

On 06/02/2015 09:02 AM, Phil Schwarz wrote:

Le 02/06/2015 15:33, Eneko Lacunza a écrit :

Hi,

On 02/06/15 15:26, Phil Schwarz wrote:

On 02/06/15 14:51, Phil Schwarz wrote:

i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) cluster.

-1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 3X 4TB
SATA
It'll be used as OSD+Mon server only.

Are these SSDs Intel S3700 too? What amount of RAM?

Yes, All DCS3700, for the four nodes.
16GB of RAM on this node.

This should be enough for 3 OSDs I think, I used to have a Dell
T20/Intel G3230 with 2x1TB OSDs with only 4 GB running OK.

Cheers
Eneko


Yes, indeed.
My main problem is doing something non adviced...
Running VMs on Ceph nodes...
No choice, but it seems that i'll have to do that.
Hope  i won't peg the CPU too quickly..


You might want to consider using cgroups or some other mechanism to 
segment what runs on what cores.  While not ideal, dedicating 2-3 of the 
cores to ceph and leaving the other(s) for VMs might be a reasonable way 
to go.


A single DC S3700 should suffice for journals for 4 OSDs.  I wouldn't 
recommend using the other one for a cache tier unless you have a very 
highly skewed hot/cold workload.  Perhaps instead make a dedicated SSD 
pool that could be used for high IOPS workloads.  In fact you might 
consider skipping SSD journals and just making a dedicated SSD pool with 
all of the SSDs depending on how much write workload your main pool sees 
and if you could make good use of a dedicated SSD pool.


Things to think about!


Best regards

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Best setup for SSD

2015-06-02 Thread Phil Schwarz
Hi,
i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) cluster.

-1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 3X 4TB SATA
It'll be used as OSD+Mon server only.

- 3 nodes are setup upon Dell 730+ 1xXeon 2603, 48 GB RAM, 1x 1TB SAS
for OS , 4x 4TB SATA for OSD and 2x DCS3700 200GB intel SSD

I can't change the hardware, especially the poor cpu...

Everything will be connected through Intel X520+Netgear XS708E, as 10GBE
storage network.

This cluster will support VM (mostly KVM) upon the 3 R730 nodes.
I'm already aware of the CPU pegging all the time...But can't change it
for the moment.
The VM will be Filesharing servers, poor usage services (DNS,DHCP,AD or
OpenLDAP).
One Proxy cache (Squid) will be used upon a 100Mb Optical fiber with
500+ clients.


My question is :
Is it recommended to setup  the 2 SSDS as :
One SSD as journal for 2 (up to 3in the future) OSDs
Or
One SSD as journal for the 4 (up to 6 in the future) OSDs and the
remaining SSD as cache tiering for the previous SSD+4 OSDs pool ?

SSD should be rock solid enough to support both bandwidth and living
time before being destroyed by the low amount of data that will be
written on it (Few hundreds of GB per day as rule of thumb..)

Thanks
Best regards.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph RBD and Cephfuse

2015-06-02 Thread Lars Marowsky-Bree
On 2015-06-02T17:23:58, gjprabu gjpr...@zohocorp.com wrote:

 Hi Lars,
 
We installed centos in client machines with kernel version is 3.10 
 which is rbd supported modules. Now installed ocsfs2-tools and formated but 
 mount through error. Please check below.

You need to configure the ocfs2 cluster properly as well. You can use
either o2cb (which I'm not familiar with anymore), or the
pacemaker-integrated version:
https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_ocfs2_create_service.html
(should pretty much apply to CentOS as well).

From this point on, rbd is really just a shared block device, and you
may have better success if you use the us...@clusterlabs.org mailing
list if you wish to pursue this route.


Regards,
Lars

-- 
Architect Storage/HA
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recommendations for a driver situation

2015-06-02 Thread Eneko Lacunza

Hi,

On 02/06/15 14:18, Pontus Lindgren wrote:

We have recently acquired new servers for a new ceph cluster and we want to run 
Debian on those servers. Unfortunately drivers needed for the raid controller 
are only available in newer kernels than what Debian Wheezy provides.

We need to run the dumpling release of Ceph.

Since the Ceph repo does not have packages for Debian Jessie I see 3 
alternatives for us:
1. Wait for the Ceph repo to add packages for Debian Jessie.
Number 1 is not really an option for us. But, is there an approximate ETA on 
this?
Why is this the case? At least Alexandre Derumier is working on this: 
(check an email from him in this list on 12th May)


http://odisoweb1.odiso.net/ceph-jessie/


2. Run Debian Wheezy with backported drivers.


I haven't used them lately, but linux kernel in wheezy-backport is 3.16, 
is this enough?


What kernel version do you require for the drivers?

Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best setup for SSD

2015-06-02 Thread Eneko Lacunza

Hi,

On 02/06/15 14:51, Phil Schwarz wrote:

i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) cluster.

-1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 3X 4TB SATA
It'll be used as OSD+Mon server only.

Are these SSDs Intel S3700 too? What amount of RAM?

- 3 nodes are setup upon Dell 730+ 1xXeon 2603, 48 GB RAM, 1x 1TB SAS
for OS , 4x 4TB SATA for OSD and 2x DCS3700 200GB intel SSD

I can't change the hardware, especially the poor cpu...

Everything will be connected through Intel X520+Netgear XS708E, as 10GBE
storage network.

This cluster will support VM (mostly KVM) upon the 3 R730 nodes.
I'm already aware of the CPU pegging all the time...But can't change it
for the moment.
The VM will be Filesharing servers, poor usage services (DNS,DHCP,AD or
OpenLDAP).
One Proxy cache (Squid) will be used upon a 100Mb Optical fiber with
500+ clients.


My question is :
Is it recommended to setup  the 2 SSDS as :
One SSD as journal for 2 (up to 3in the future) OSDs
Or
One SSD as journal for the 4 (up to 6 in the future) OSDs and the
remaining SSD as cache tiering for the previous SSD+4 OSDs pool ?
I haven't used cache tiering myself, but others have not reported much 
benefit from it (if any) at all, at least this is my understanding.


So I think it would be better to use both SSDs for journals. It probably 
won't help performance using 2 instead of only 1, but it will lessen the 
impact from a SSD failure. Also it seems that the consensus is 3-4 OSD 
for each SSD, so it will help when you expand to 6 OSD.

SSD should be rock solid enough to support both bandwidth and living
time before being destroyed by the low amount of data that will be
written on it (Few hundreds of GB per day as rule of thumb..)
If all are Intel S3700 you're on the safe side unless you have lots on 
writes. Anyway I suggest you monitor the SMART values.


Cheers
Eneko


--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best setup for SSD

2015-06-02 Thread Phil Schwarz
Thanks for your answers; mine are inline, too.

Le 02/06/2015 15:17, Eneko Lacunza a écrit :
 Hi,
 
 On 02/06/15 14:51, Phil Schwarz wrote:
 i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) cluster.

 -1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 3X 4TB
 SATA
 It'll be used as OSD+Mon server only.
 Are these SSDs Intel S3700 too? What amount of RAM?
Yes, All DCS3700, for the four nodes.
16GB of RAM on this node.
 - 3 nodes are setup upon Dell 730+ 1xXeon 2603, 48 GB RAM, 1x 1TB SAS
 for OS , 4x 4TB SATA for OSD and 2x DCS3700 200GB intel SSD

 I can't change the hardware, especially the poor cpu...

 Everything will be connected through Intel X520+Netgear XS708E, as 10GBE
 storage network.

 This cluster will support VM (mostly KVM) upon the 3 R730 nodes.
 I'm already aware of the CPU pegging all the time...But can't change it
 for the moment.
 The VM will be Filesharing servers, poor usage services (DNS,DHCP,AD or
 OpenLDAP).
 One Proxy cache (Squid) will be used upon a 100Mb Optical fiber with
 500+ clients.


 My question is :
 Is it recommended to setup  the 2 SSDS as :
 One SSD as journal for 2 (up to 3in the future) OSDs
 Or
 One SSD as journal for the 4 (up to 6 in the future) OSDs and the
 remaining SSD as cache tiering for the previous SSD+4 OSDs pool ?
 I haven't used cache tiering myself, but others have not reported much
 benefit from it (if any) at all, at least this is my understanding.
 
Yes, confirmed by the thread SSD DIsk Distribution.
 So I think it would be better to use both SSDs for journals. It probably
 won't help performance using 2 instead of only 1, but it will lessen the
 impact from a SSD failure. Also it seems that the consensus is 3-4 OSD
 for each SSD, so it will help when you expand to 6 OSD.
Agree; let's go apart from tiering and use journals only.

 SSD should be rock solid enough to support both bandwidth and living
 time before being destroyed by the low amount of data that will be
 written on it (Few hundreds of GB per day as rule of thumb..)
 If all are Intel S3700 you're on the safe side unless you have lots on
 writes. Anyway I suggest you monitor the SMART values.
Ok, i'll keep that in mind too.

Thanks
 
 Cheers
 Eneko
 
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best setup for SSD

2015-06-02 Thread Eneko Lacunza

Hi,

On 02/06/15 15:26, Phil Schwarz wrote:

On 02/06/15 14:51, Phil Schwarz wrote:

i'm gonna have to setup a 4-nodes Ceph(Proxmox+Ceph in fact) cluster.

-1 node is a little HP Microserver N54L with 1X opteron + 2SSD+ 3X 4TB
SATA
It'll be used as OSD+Mon server only.

Are these SSDs Intel S3700 too? What amount of RAM?

Yes, All DCS3700, for the four nodes.
16GB of RAM on this node.
This should be enough for 3 OSDs I think, I used to have a Dell 
T20/Intel G3230 with 2x1TB OSDs with only 4 GB running OK.


Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
  943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recommendations for a driver situation

2015-06-02 Thread Pontus Lindgren
 Why is this the case? At least Alexandre Derumier is working on this: (check 
 an email from him in this list on 12th May)
 
 http://odisoweb1.odiso.net/ceph-jessie/

We are in a hurry.

 I haven't used them lately, but linux kernel in wheezy-backport is 3.16, is 
 this enough?
 
 What kernel version do you require for the drivers?
Yes 3.16 is enough. So this is looking like the best option right now. 

Pontus Lindgren 

 On 02 Jun 2015, at 15:08, Eneko Lacunza elacu...@binovo.es wrote:
 
 Hi,
 
 On 02/06/15 14:18, Pontus Lindgren wrote:
 We have recently acquired new servers for a new ceph cluster and we want to 
 run Debian on those servers. Unfortunately drivers needed for the raid 
 controller are only available in newer kernels than what Debian Wheezy 
 provides.
 
 We need to run the dumpling release of Ceph.
 
 Since the Ceph repo does not have packages for Debian Jessie I see 3 
 alternatives for us:
 1. Wait for the Ceph repo to add packages for Debian Jessie.
 Number 1 is not really an option for us. But, is there an approximate ETA on 
 this?
 Why is this the case? At least Alexandre Derumier is working on this: (check 
 an email from him in this list on 12th May)
 
 http://odisoweb1.odiso.net/ceph-jessie/
 
 2. Run Debian Wheezy with backported drivers.
 
 I haven't used them lately, but linux kernel in wheezy-backport is 3.16, is 
 this enough?
 
 What kernel version do you require for the drivers?
 
 Cheers
 Eneko
 
 -- 
 Zuzendari Teknikoa / Director Técnico
 Binovo IT Human Project, S.L.
 Telf. 943575997
  943493611
 Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
 www.binovo.es
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What do internal_safe_to_start_threads and leveldb_compression do?

2015-06-02 Thread Erik Logtenberg
 What does this do?

 - leveldb_compression: false (default: true)
 - leveldb_block/cache/write_buffer_size (all bigger than default)
 
 I take it you're running these commands on a monitor (from I think the
 Dumpling timeframe, or maybe even Firefly)? These are hitting specific
 settings in LevelDB which we tune differently for the monitor and OSD,
 but which were shared config options in older releases. They have
 their own settings in newer code.
 -Greg
 

You are correct. I started out with Firefly and gradually upgraded the
cluster as new releases came out. I am on Hammer (0.94.1) now.

The current settings are different from the default. Does this mean
that the settings are still Firefly-like and should be changed to the
new default; or does this mean that the defaults are still Firefly-like
but the settings are actually Hammer-style ;) and thus right.

Thanks,

Erik.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-02 Thread Somnath Roy
By any chance are you running with jumbo frame turned on ?

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Joao 
Eduardo Luis
Sent: Tuesday, June 02, 2015 12:52 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables 
off, can see tcp traffic)

On 06/02/2015 01:42 AM, cameron.scr...@solnet.co.nz wrote:
 I am trying to deploy a new ceph cluster and my monitors are not
 reaching quorum. SELinux is off, firewalls are off, I can see traffic
 between the nodes on port 6789 but when I use the admin socket to
 force a re-election only the monitor I send the request to shows the
 new election in its logs. My logs are filled entirely of the following
 two
 lines:

 2015-06-02 11:31:56.447975 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd='mon_status' args=[]:
 dispatch
 2015-06-02 11:31:56.448272 7f795b17a700  0 log_channel(audit) log
 [DBG]
 : from='admin socket' entity='admin socket' cmd=mon_status args=[]:
 finished

You are running on default debug levels, so you'll hardly get anything more 
than that.  I suggest setting 'debug mon = 10' and 'debug ms = 1'
for added verbosity and come back to us with the logs.

There are many reasons for this, but the more common are due to the monitors 
not being able to communicate with each other.  Given you see traffic between 
the monitors, I'm inclined to assume that the other two monitors do not have 
each other on the monmap or, if they do know each other, either 1) the 
monitor's auth keys do not match, or 2) the probe timeout is being triggered 
before they successfully manage to find enough monitors to trigger an election 
-- which may be due to latency.

Logs will tells us more.

  -Joao

 Querying the admin socket with mon_status (the other two are the
 similar but with their hostnames and rank):

 {
 name: wcm1,
 rank: 0,
 state: probing,
 election_epoch: 1,
 quorum: [],
 outside_quorum: [
 wcm1
 ],
 extra_probe_peers: [],
 sync_provider: [],
 monmap: {
 epoch: 0,
 fsid: adb8c500-122e-49fd-9c1e-a99af7832307,
 modified: 2015-06-02 10:43:41.467811,
 created: 2015-06-02 10:43:41.467811,
 mons: [
 {
 rank: 0,
 name: wcm1,
 addr: 10.1.226.64:6789\/0
 },
 {
 rank: 1,
 name: wcm2,
 addr: 10.1.226.65:6789\/0
 },
 {
 rank: 2,
 name: wcm3,
 addr: 10.1.226.66:6789\/0
 }
 ]
 }
 }

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] bursty IO, ceph cache pool can not follow evictions

2015-06-02 Thread Kenneth Waegeman

Hi,

we were rsync-streaming with 4 cephfs client to a ceph cluster with a 
cache layer upon an erasure coded pool.

This was going on for some time, and didn't have real problems.

Today we added 2 more streams, and very soon we saw some strange behaviour:
- We are getting blocked requests on our cache pool osds
- our cache pool is often near/ at max ratio
- Our data streams have very bursty IO, (streaming a minute a few 
hunderds MB and then nothing)


Our OSDs are not overloaded (nor the ECs nor cache, checked with 
iostat), though it seems like the cache pool can not evict objects in 
time, and get blocked until that is ok, each time again.
If I rise the target_max_bytes limit, it starts streaming again until it 
is full again.


cache parameters we have are these:
ceph osd pool set cache hit_set_type bloom
ceph osd pool set cache hit_set_count 1
ceph osd pool set cache hit_set_period 3600
ceph osd pool set cache target_max_bytes $((14*75*1024*1024*1024))
ceph osd pool set cache cache_target_dirty_ratio 0.4
ceph osd pool set cache cache_target_full_ratio 0.8


What can be the issue here ? I tried to find some information about the 
'cache agent' , but can only find some old references..


Thank you!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bursty IO, ceph cache pool can not follow evictions

2015-06-02 Thread Nick Fisk
Hi Kenneth,

I suggested an idea which may help with this, it is being currently being
developed .

https://github.com/ceph/ceph/pull/4792

In short there is a high and low threshold with different flushing
priorities. Hopefully this will help with bursty workloads.

Nick

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Kenneth Waegeman
 Sent: 02 June 2015 17:54
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] bursty IO, ceph cache pool can not follow evictions
 
 Hi,
 
 we were rsync-streaming with 4 cephfs client to a ceph cluster with a
cache
 layer upon an erasure coded pool.
 This was going on for some time, and didn't have real problems.
 
 Today we added 2 more streams, and very soon we saw some strange
 behaviour:
 - We are getting blocked requests on our cache pool osds
 - our cache pool is often near/ at max ratio
 - Our data streams have very bursty IO, (streaming a minute a few hunderds
 MB and then nothing)
 
 Our OSDs are not overloaded (nor the ECs nor cache, checked with iostat),
 though it seems like the cache pool can not evict objects in time, and get
 blocked until that is ok, each time again.
 If I rise the target_max_bytes limit, it starts streaming again until it
is full
 again.
 
 cache parameters we have are these:
 ceph osd pool set cache hit_set_type bloom ceph osd pool set cache
 hit_set_count 1 ceph osd pool set cache hit_set_period 3600 ceph osd pool
 set cache target_max_bytes $((14*75*1024*1024*1024)) ceph osd pool set
 cache cache_target_dirty_ratio 0.4 ceph osd pool set cache
 cache_target_full_ratio 0.8
 
 
 What can be the issue here ? I tried to find some information about the
 'cache agent' , but can only find some old references..
 
 Thank you!
 
 Kenneth
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG size distribution

2015-06-02 Thread Daniel Maraio

Hello,

  I have some questions about the size of my placement groups and how I 
can get a more even distribution. We currently have 160 2TB OSDs across 
20 chassis.  We have 133TB used in our radosgw pool with a replica size 
of 2. We want to move to 3 replicas but are concerned we may fill up 
some of our OSDs. Some OSDs have ~1.1TB free while others only have 
~600GB free. The radosgw pool has 4096 pgs, looking at the documentation 
I probably want to increase this up to 8192, but we have decided to hold 
off on that for now.


  So, now for the pg usage. I dumped out the PG stats and noticed that 
there are two groups of PG sizes in my cluster. There are about 1024 PGs 
that are each around 17-18GB in size. The rest of the PGs are all around 
34-36GB in size. Any idea why there are two distinct groups? We only 
have the one pool with data in it, though there are several different 
buckets in the radosgw pool. The data in the pool ranges from small 
images to 4-6mb audio files. Will increasing the number of PGs on this 
pool provide a more even distribution?


  Another thing to note is that the initial cluster was built lopsided, 
with some 4TB OSDs and some 2TB, we have removed all the 4TB disks and 
are only using 2TBs across the entire cluster. Not sure if this would 
have had any impact.


  Thank you for your time and I would appreciate any insight the 
community can offer.


- Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bursty IO, ceph cache pool can not follow evictions

2015-06-02 Thread Paul Evans
Kenneth,
  My guess is that you’re hitting the cache_target_full_ratio on an individual 
OSD, which is easy to do since most of us tend to think of the 
cache_target_full_ratio as an aggregate of the OSDs (which it is not according 
to Greg Farnum).   This posting may shed more light on the issue, if it is 
indeed what you are bumping up against.  
https://www.mail-archive.com/ceph-users%40lists.ceph.com/msg20207.html

  BTW: how are you determining that your OSDs are ‘not overloaded?’  Are you 
judging that by iostat utilization, or by capacity consumed?
--
Paul


On Jun 2, 2015, at 9:53 AM, Kenneth Waegeman 
kenneth.waege...@ugent.bemailto:kenneth.waege...@ugent.be wrote:

Hi,

we were rsync-streaming with 4 cephfs client to a ceph cluster with a cache 
layer upon an erasure coded pool.
This was going on for some time, and didn't have real problems.

Today we added 2 more streams, and very soon we saw some strange behaviour:
- We are getting blocked requests on our cache pool osds
- our cache pool is often near/ at max ratio
- Our data streams have very bursty IO, (streaming a minute a few hunderds MB 
and then nothing)

Our OSDs are not overloaded (nor the ECs nor cache, checked with iostat), 
though it seems like the cache pool can not evict objects in time, and get 
blocked until that is ok, each time again.
If I rise the target_max_bytes limit, it starts streaming again until it is 
full again.

cache parameters we have are these:
ceph osd pool set cache hit_set_type bloom
ceph osd pool set cache hit_set_count 1
ceph osd pool set cache hit_set_period 3600
ceph osd pool set cache target_max_bytes $((14*75*1024*1024*1024))
ceph osd pool set cache cache_target_dirty_ratio 0.4
ceph osd pool set cache cache_target_full_ratio 0.8


What can be the issue here ? I tried to find some information about the 'cache 
agent' , but can only find some old references..

Thank you!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Read Errors and OSD Flapping

2015-06-02 Thread Nick Fisk
 
 On Sun, May 31, 2015 at 2:09 AM, Nick Fisk  wrote:
 
  Thanks for the suggestions. I will introduce the disk 1st and see if the 
  smart
 stats change from pending sectors to reallocated, if they don't then I will do
 the DD and smart test. It will be a good test as to what to do in this 
 situation
 as I have a feeling this will most likely happen again.
 
 Please post back when you have a result, I'd like to know the outcome.

Well the disk has finished rebalancing back into the cluster. The smart stats 
are not showing any pending sectors anymore, but strangely no reallocated ones 
either. I can only guess that when the drive tried to write to them again it 
succeeded without needing a remap???

I will continue to monitor the disk smart stats and see if I hit the same 
problem again.

 
 - 
 Robert LeBlanc
 GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1 -
 BEGIN PGP SIGNATURE-
 Version: Mailvelope v0.13.1
 Comment: https://www.mailvelope.com
 
 wsFcBAEBCAAQBQJVbLDTCRDmVDuy+mK58QAAFiwP/2EubdyL06YSNgSGyOr
 4
 +hWPTq530xvD/M6HNHb9xajQv8UGRF0uOM/FI/n1ln7ajDRbDGn/WazMgZD
 N
 uvCRpEtkw/OSRXiabBmPmKcACtMQbFADPMyDVR2130pmedN/pFHZFASy8X
 Cg
 IpnE5+Oj2+Fe8z1fXnwpHdutVE0I/BK+4vQAMuypVUwpv5jZ+Nd1NSOUbe7T
 q/x3vUQNEVpqSP5YCYYJJZOluAdmuvyAzsP1pMP42G920/F1KVVyyFG/ONnv
 0EtPNG7FrpMauT0OM9zhSkTkfF4rYdK1L9MqzsI0hDqYMijPXe+tcHrndM3s
 l+wU5ZsKpQ+6xy6Rgv6LJdvVrXME5twAgy6y8dBtOSwyJztc/77w+FT4xbDS
 wg2k9AH09uG3CehvTvkuPQQkyXtCT+4LYpeU5l9aMn1hPFh0iOJdBi7rPbOf
 17ERT+c0EPReZ+lSCwYEeVnd9iL8quE9AFEKYzDJnZCL2jDQY4Fr7JC2dyw/
 LF1CKk5WU78eQT4aS3AaV0wYG+UzPFeTj8cPeWtqBrQtgzkPjPzeG/7Kpsf3
 npWc/HQg7LB8rZAZ3ADRVE+KaJhuUsl1gRfk78bdGbTBDTpyeki7kywY6ODi
 +OUpUEPhyxkNr0OeD8eAQz2k+6/RJQfBFTeevuLRbMTlESGQnUpNVMk/1A7
 7
 yCPF
 =c0Vh
 -END PGP SIGNATURE-




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Read Errors and OSD Flapping

2015-06-02 Thread Gregory Farnum
On Sat, May 30, 2015 at 2:23 PM, Nick Fisk n...@fisk.me.uk wrote:

 Hi All,



 I was noticing poor performance on my cluster and when I went to investigate 
 I noticed OSD 29 was flapping up and down. On investigation it looks like it 
 has 2 pending sectors, kernel log is filled with the following



 end_request: critical medium error, dev sdk, sector 4483365656

 end_request: critical medium error, dev sdk, sector 4483365872



 I can see in the OSD logs that it looked like when the OSD was crashing it 
 was trying to scrub the PG, probably failing when the kernel passes up the 
 read error.



 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)

 1: /usr/bin/ceph-osd() [0xacaf4a]

 2: (()+0x10340) [0x7fdc43032340]

 3: (gsignal()+0x39) [0x7fdc414d1cc9]

 4: (abort()+0x148) [0x7fdc414d50d8]

 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fdc41ddc6b5]

 6: (()+0x5e836) [0x7fdc41dda836]

 7: (()+0x5e863) [0x7fdc41dda863]

 8: (()+0x5eaa2) [0x7fdc41ddaaa2]

 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
 const*)+0x278) [0xbc2908]

 10: (FileStore::read(coll_t, ghobject_t const, unsigned long, unsigned long, 
 ceph::buffer::list, unsigned int, bool)+0xc98) [0x9168e

 8]

 11: (ReplicatedBackend::be_deep_scrub(hobject_t const, unsigned int, 
 ScrubMap::object, ThreadPool::TPHandle)+0x2f9) [0xa05bf9]

 12: (PGBackend::be_scan_list(ScrubMap, std::vectorhobject_t, 
 std::allocatorhobject_t  const, bool, unsigned int, ThreadPool::TPH

 andle)+0x2c8) [0x8dab98]

 13: (PG::build_scrub_map_chunk(ScrubMap, hobject_t, hobject_t, bool, 
 unsigned int, ThreadPool::TPHandle)+0x1fa) [0x7f099a]

 14: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle)+0x4a2) [0x7f1132]

 15: (OSD::RepScrubWQ::_process(MOSDRepScrub*, ThreadPool::TPHandle)+0xbe) 
 [0x6e583e]

 16: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb38ae]

 17: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]

 18: (()+0x8182) [0x7fdc4302a182]

 19: (clone()+0x6d) [0x7fdc4159547d]



 Few questions:

 1.   Is this the expected behaviour, or should Ceph try and do something 
 to either keep the OSD down or rewrite the sector to cause a sector remap?

So the OSD is committing suicide and we want it to stay dead. But the
init system is restarting it. We are actually discussing how that
should change right now, but aren't quite sure what the right settings
are: http://tracker.ceph.com/issues/11798

Presuming you still have the logs, how long was the cycle time for it
to suicide, restart, and suicide again?


 2.   I am monitoring smart stats, but is there any other way of picking 
 this up or getting Ceph to highlight it? Something like a flapping OSD 
 notification would be nice.

 3.   I’m assuming at this stage this disk will not be replaceable under 
 warranty, am I best to mark it as out, let it drain and then re-introduce it 
 again, which should overwrite the sector and cause a remap? Or is there a 
 better way?

I'm not really sure about these ones. I imagine most users are
covering it via nagios monitoring of the processes themselves?
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG size distribution

2015-06-02 Thread Jan Schermer
Post the output from your “ceph osd tree”.
We were in a similiar situation, some of the OSDs were quite full while other 
had 50% free. This is exactly why we increased the number of PGs, and it 
helped to some degree.
Are all your hosts the same size? Does your CRUSH map select a host in the end? 
That way if you have few hosts with differing number of OSDs the distribution 
will be poor (IMHO).

Anyway, when we started increasing the PG numbers we first generated the PGs 
themselves (pg_num) in small increments since that put a lot of load on the 
OSDs and we were seeing slow requests with large increases.
So something like this:
for i in `seq 4096 64 8192` ; do ceph osd pool set poolname pg_num $i ; done
This ate a few gigs from the drives (1-2GB if I remember correctly).

Once that was finished we increased the pgp_num in larger and larger increments 
 - at first 64 at a time and then 512 at a time when we were reaching the 
target (16384 in our case). This does allocate more space temporarily, and it 
seems to just randomly move data around - one minute an OSD is fine, another 
and the OSD is nearing full. One of us basically had to watch the process all 
the time, reweighting the devices that were almost full.
With increasing number of PGs it became much simpler, as the overhead was 
smaller, every bit of work was smaller and all the management operations a lot 
smoother.

YMMV - our data distribution was poor from the start, hosts had differing 
weights due to differing number of OSDs, there were some historical remnants 
when we tried to load-balance the data by hand, and we ended in a much better 
state but not perfect - some OSDs still have much more free space than other.
We haven’t touched the CRUSH map at all during this process, once we do and set 
newer tunables then the data distribution should be much more even.

I’d love to hear the others’ input since we are not sure why exactly this 
problem is present at all - I’d expect it to fill all the OSDs to the same or 
close-enough level, but in reality we have OSDs with weight 1.0 which are 
almost empty and others with weight 0.5 which are nearly full… When adding data 
it seems to (subjectively) distribute them evenly...

Jan

 On 02 Jun 2015, at 18:52, Daniel Maraio dmar...@choopa.com wrote:
 
 Hello,
 
  I have some questions about the size of my placement groups and how I can 
 get a more even distribution. We currently have 160 2TB OSDs across 20 
 chassis.  We have 133TB used in our radosgw pool with a replica size of 2. We 
 want to move to 3 replicas but are concerned we may fill up some of our OSDs. 
 Some OSDs have ~1.1TB free while others only have ~600GB free. The radosgw 
 pool has 4096 pgs, looking at the documentation I probably want to increase 
 this up to 8192, but we have decided to hold off on that for now.
 
  So, now for the pg usage. I dumped out the PG stats and noticed that there 
 are two groups of PG sizes in my cluster. There are about 1024 PGs that are 
 each around 17-18GB in size. The rest of the PGs are all around 34-36GB in 
 size. Any idea why there are two distinct groups? We only have the one pool 
 with data in it, though there are several different buckets in the radosgw 
 pool. The data in the pool ranges from small images to 4-6mb audio files. 
 Will increasing the number of PGs on this pool provide a more even 
 distribution?
 
  Another thing to note is that the initial cluster was built lopsided, with 
 some 4TB OSDs and some 2TB, we have removed all the 4TB disks and are only 
 using 2TBs across the entire cluster. Not sure if this would have had any 
 impact.
 
  Thank you for your time and I would appreciate any insight the community can 
 offer.
 
 - Daniel
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Read Errors and OSD Flapping

2015-06-02 Thread Nick Fisk




 -Original Message-
 From: Gregory Farnum [mailto:g...@gregs42.com]
 Sent: 02 June 2015 18:34
 To: Nick Fisk
 Cc: ceph-users
 Subject: Re: [ceph-users] Read Errors and OSD Flapping
 
 On Sat, May 30, 2015 at 2:23 PM, Nick Fisk n...@fisk.me.uk wrote:
 
  Hi All,
 
 
 
  I was noticing poor performance on my cluster and when I went to
  investigate I noticed OSD 29 was flapping up and down. On
  investigation it looks like it has 2 pending sectors, kernel log is
  filled with the following
 
 
 
  end_request: critical medium error, dev sdk, sector 4483365656
 
  end_request: critical medium error, dev sdk, sector 4483365872
 
 
 
  I can see in the OSD logs that it looked like when the OSD was crashing it
 was trying to scrub the PG, probably failing when the kernel passes up the
 read error.
 
 
 
  ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
 
  1: /usr/bin/ceph-osd() [0xacaf4a]
 
  2: (()+0x10340) [0x7fdc43032340]
 
  3: (gsignal()+0x39) [0x7fdc414d1cc9]
 
  4: (abort()+0x148) [0x7fdc414d50d8]
 
  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fdc41ddc6b5]
 
  6: (()+0x5e836) [0x7fdc41dda836]
 
  7: (()+0x5e863) [0x7fdc41dda863]
 
  8: (()+0x5eaa2) [0x7fdc41ddaaa2]
 
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
  const*)+0x278) [0xbc2908]
 
  10: (FileStore::read(coll_t, ghobject_t const, unsigned long,
  unsigned long, ceph::buffer::list, unsigned int, bool)+0xc98)
  [0x9168e
 
  8]
 
  11: (ReplicatedBackend::be_deep_scrub(hobject_t const, unsigned int,
  ScrubMap::object, ThreadPool::TPHandle)+0x2f9) [0xa05bf9]
 
  12: (PGBackend::be_scan_list(ScrubMap, std::vectorhobject_t,
  std::allocatorhobject_t  const, bool, unsigned int,
  ThreadPool::TPH
 
  andle)+0x2c8) [0x8dab98]
 
  13: (PG::build_scrub_map_chunk(ScrubMap, hobject_t, hobject_t, bool,
  unsigned int, ThreadPool::TPHandle)+0x1fa) [0x7f099a]
 
  14: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle)+0x4a2)
  [0x7f1132]
 
  15: (OSD::RepScrubWQ::_process(MOSDRepScrub*,
  ThreadPool::TPHandle)+0xbe) [0x6e583e]
 
  16: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb38ae]
 
  17: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
 
  18: (()+0x8182) [0x7fdc4302a182]
 
  19: (clone()+0x6d) [0x7fdc4159547d]
 
 
 
  Few questions:
 
  1.   Is this the expected behaviour, or should Ceph try and do something
 to either keep the OSD down or rewrite the sector to cause a sector remap?
 
 So the OSD is committing suicide and we want it to stay dead. But the init
 system is restarting it. We are actually discussing how that should change
 right now, but aren't quite sure what the right settings
 are: http://tracker.ceph.com/issues/11798
 
 Presuming you still have the logs, how long was the cycle time for it to
 suicide, restart, and suicide again?

Just looking through a few examples of it. It looks like it took about 2 
seconds from suicide to restart and then about 5 minutes till it died again.

I have taken a copy of the log, let me know if it's of any use to you.

 
 
  2.   I am monitoring smart stats, but is there any other way of picking 
  this
 up or getting Ceph to highlight it? Something like a flapping OSD notification
 would be nice.
 
  3.   I’m assuming at this stage this disk will not be replaceable under
 warranty, am I best to mark it as out, let it drain and then re-introduce it
 again, which should overwrite the sector and cause a remap? Or is there a
 better way?
 
 I'm not really sure about these ones. I imagine most users are covering it via
 nagios monitoring of the processes themselves?


 -Greg




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com