Re: [ceph-users] ceph command hangs

2018-01-18 Thread Nathan Dehnel
gentooserver ~ # ceph -s --debug-ms=1
2018-01-18 21:09:55.981886 7f9581f33700  1  Processor -- start
2018-01-18 21:09:55.981919 7f9581f33700  1 -- - start start
2018-01-18 21:09:55.982006 7f9581f33700  1 -- - -->
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 -- auth(proto 0 30 bytes
epoch 0) v1 -- 0x7f957c129110 con 0
2018-01-18 21:09:55.982701 7f957b7fe700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 learned_addr learned my
addr [2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692
2018-01-18 21:09:55.983088 7f9579ffb700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 <== mon.0
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 1  mon_map magic: 0 v1
 208+0+0 (3367529148 0 0) 0x7f956c001130 con 0x7f957c124780
2018-01-18 21:09:55.983137 7f9579ffb700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 <== mon.0
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 2  auth_reply(proto 1 0
(0) Success) v1  24+0+0 (3149240224 0 0) 0x7f956c001490 con
0x7f957c124780
2018-01-18 21:09:55.983169 7f9579ffb700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 -->
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 --
mon_subscribe({monmap=0+}) v2 -- 0x7f957c12a700 con 0
2018-01-18 21:09:55.983204 7f9581f33700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 -->
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 --
mon_subscribe({mgrmap=0+}) v2 -- 0x7f957c129210 con 0
2018-01-18 21:09:55.983260 7f9581f33700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 -->
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 -- mon_subscribe({osdmap=0})
v2 -- 0x7f957c129210 con 0
2018-01-18 21:09:55.983409 7f9579ffb700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 <== mon.0
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 3  mon_map magic: 0 v1
 208+0+0 (3367529148 0 0) 0x7f956c001050 con 0x7f957c124780
2018-01-18 21:09:55.983459 7f9579ffb700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 <== mon.0
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 4  mgrmap(e 1) v1 
103+0+0 (706778617 0 0) 0x7f956c001470 con 0x7f957c124780
2018-01-18 21:09:55.983562 7f9579ffb700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 <== mon.0
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 5  osd_map(1..1 src has
1..1) v3  638+0+0 (1741579187 0 0) 0x7f956c0011b0 con 0x7f957c124780
2018-01-18 21:09:55.985228 7f9581f33700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 -->
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 -- mon_command({"prefix":
"get_command_descriptions"} v 0) v1 -- 0x7f957c12a0c0 con 0
2018-01-18 21:10:35.982605 7f9578ff9700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 >>
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 conn(0x7f957c124780 :-1
s=STATE_OPEN pgs=3 cs=1 l=1).mark_down
2018-01-18 21:10:35.982706 7f9578ff9700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 -->
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 -- auth(proto 0 30 bytes
epoch 1) v1 -- 0x7f9560005af0 con 0
2018-01-18 21:10:35.983813 7f9579ffb700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 <== mon.0
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 1  auth_reply(proto 1 0
(0) Success) v1  24+0+0 (3149240224 0 0) 0x7f956c000d50 con
0x7f95600012a0
2018-01-18 21:10:35.983856 7f9579ffb700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 -->
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 --
mon_subscribe({mgrmap=0+,monmap=2+}) v2 -- 0x7f9560007190 con 0
2018-01-18 21:10:35.983875 7f9579ffb700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 -->
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 -- mon_command({"prefix":
"get_command_descriptions"} v 0) v1 -- 0x7f955c0033c0 con 0
2018-01-18 21:10:35.984094 7f9579ffb700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 <== mon.0
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 2  mgrmap(e 1) v1 
103+0+0 (706778617 0 0) 0x7f956c000fd0 con 0x7f95600012a0
2018-01-18 21:11:11.983389 7f9578ff9700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 >>
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 conn(0x7f95600012a0 :-1
s=STATE_OPEN pgs=4 cs=1 l=1).mark_down
2018-01-18 21:11:11.983464 7f9578ff9700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 -->
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 -- auth(proto 0 30 bytes
epoch 1) v1 -- 0x7f95600062b0 con 0
2018-01-18 21:11:11.984386 7f9579ffb700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 <== mon.0
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 1  auth_reply(proto 1 0
(0) Success) v1  24+0+0 (3149240224 0 0) 0x7f956c000d50 con
0x7f9560007290
2018-01-18 21:11:11.984427 7f9579ffb700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 -->
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 --
mon_subscribe({mgrmap=0+,monmap=2+}) v2 -- 0x7f95600068f0 con 0
2018-01-18 21:11:11.984458 7f9579ffb700  1 --
[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:0/285566692 -->

Re: [ceph-users] Hadoop on Ceph error

2018-01-18 Thread Bishoy Mikhael
here is my core-site.xml file




   
  fs.default.name
  ceph://host01:6789/
   
   
  fs.defaultFS
  ceph://host01:6789
   
   
  io.file.buffer.size
  131072
   
   
  hadoop.tmp.dir
  /mnt/hadoop/hadoop_tmp
   
   
  ceph.conf.file
  /etc/ceph/ceph.conf
   
   
  fs.AbstractFileSystem.ceph.impl
  org.apache.hadoop.fs.ceph.CephFs
   
   
  fs.ceph.impl
  org.apache.hadoop.fs.ceph.CephFileSystem
   
   
  ceph.auth.keyring
  /etc/ceph/ceph.client.admin.keyring
   
   
  ceph.object.size
  67108864
   
   
  ceph.data.pools
  hadoop_data
   
   
  ceph.localize.reads
  true
   


here is the permissions for the client keyring

# ls -lh /etc/ceph/ceph.client.admin.keyring

-rw---. 1 root root 137 Jan 12 00:14 /etc/ceph/ceph.client.admin.keyring

here is the permission for ceph.conf

# ls -lh /etc/ceph/ceph.conf

-rw-r--r--. 1 root root 3.0K Jan 12 00:14 /etc/ceph/ceph.conf

how to set debug option in ceph.conf.options with Hadoop xml config file?

On Thu, Jan 18, 2018 at 7:54 PM, Jean-Charles Lopez 
wrote:

> Hi,
>
> What’s your Hadoop xml config file like?
>
> Have you checked the permissions of the ceph.conf and keyring file?
>
> In case all good may be consider setting debug option in ceph.conf.options
> with Hadoop xml config file
>
> JC
>
>
> On Jan 18, 2018, at 16:55, Bishoy Mikhael  wrote:
>
> Hi All,
>
> I've a tiny Ceph 12.2.2 cluster setup with three nodes, 17 OSDs, 3
> MON,MDS,MGR (spanned across the three nodes).
> Hadoop 2.7.3 is configured on only one of the three nodes as follows:
> - Hadoop binaries was extracted to /opt/hadoop/bin/
> - Hadoop config files where at /opt/hadoop/etc/hadoop/
> - Hadoop-cephfs.jar was downloaded from http://download.ceph.com/tarballs/
> to /opt/hadoop/lib/ which the last update to it was on 12-Mar-2013
> - The following symbolic links have been done:
> # ln -s /usr/lib64/libcephfs_jni.so.1.0.0 /usr/lib64/libcephfs_jni.so
> # cp /usr/lib64/libcephfs_jni.so.1.0.0 /opt/hadoop/lib/native/
> # ln -s /opt/hadoop/lib/native/libcephfs_jni.so.1.0.0
> /opt/hadoop/lib/native/libcephfs_jni.so.1
> # ln -s /opt/hadoop/lib/native/libcephfs_jni.so.1.0.0
> /opt/hadoop/lib/native/libcephfs_jni.so
> # ln -s /usr/share/java/libcephfs.jar /opt/hadoop/lib/
> The following modification to Hadoop-config.sh has been done:
>
> */opt/hadoop/libexec/hadoop-config.sh*
> # CLASSPATH initially contains $HADOOP_CONF_DIR
> CLASSPATH="${HADOOP_CONF_DIR}:/opt/hadoop/lib/libcephfs.jar:
> /opt/hadoop/lib/hadoop-cephfs.jar"
>
> So writes and reads to/from Ceph using HDFS CLI works fine, but when I use
> hadoop Java library I get the following error:
>
> ERROR HdfsTraveller:58 - com.ceph.fs.CephNotMountedException: not mounted
>
> fileSystem.globStatus(path)
> FileSystem.globStatus in hdfs api
> ceph returns null pointer
>
> Any idea what's going on? is it a configuration problem? is it a Ceph
> problem? Did anybody see that error before?
>
>
> Regards,
> Bishoy
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hadoop on Ceph error

2018-01-18 Thread Jean-Charles Lopez
Hi,

What’s your Hadoop xml config file like?

Have you checked the permissions of the ceph.conf and keyring file?

In case all good may be consider setting debug option in ceph.conf.options with 
Hadoop xml config file

JC


> On Jan 18, 2018, at 16:55, Bishoy Mikhael  wrote:
> 
> Hi All,
> 
> I've a tiny Ceph 12.2.2 cluster setup with three nodes, 17 OSDs, 3 
> MON,MDS,MGR (spanned across the three nodes).
> Hadoop 2.7.3 is configured on only one of the three nodes as follows:
> - Hadoop binaries was extracted to /opt/hadoop/bin/
> - Hadoop config files where at /opt/hadoop/etc/hadoop/
> - Hadoop-cephfs.jar was downloaded from http://download.ceph.com/tarballs/ 
>  to /opt/hadoop/lib/ which the last 
> update to it was on 12-Mar-2013
> - The following symbolic links have been done:
> # ln -s /usr/lib64/libcephfs_jni.so.1.0.0 /usr/lib64/libcephfs_jni.so
> # cp /usr/lib64/libcephfs_jni.so.1.0.0 /opt/hadoop/lib/native/
> # ln -s /opt/hadoop/lib/native/libcephfs_jni.so.1.0.0 
> /opt/hadoop/lib/native/libcephfs_jni.so.1
> # ln -s /opt/hadoop/lib/native/libcephfs_jni.so.1.0.0 
> /opt/hadoop/lib/native/libcephfs_jni.so
> # ln -s /usr/share/java/libcephfs.jar /opt/hadoop/lib/
> The following modification to Hadoop-config.sh has been done:
> /opt/hadoop/libexec/hadoop-config.sh
> # CLASSPATH initially contains $HADOOP_CONF_DIR
> CLASSPATH="${HADOOP_CONF_DIR}:/opt/hadoop/lib/libcephfs.jar:/opt/hadoop/lib/hadoop-cephfs.jar"
> 
> So writes and reads to/from Ceph using HDFS CLI works fine, but when I use 
> hadoop Java library I get the following error:
> 
> ERROR HdfsTraveller:58 - com.ceph.fs.CephNotMountedException: not mounted
> 
> fileSystem.globStatus(path)
> FileSystem.globStatus in hdfs api
> ceph returns null pointer
> 
> Any idea what's going on? is it a configuration problem? is it a Ceph 
> problem? Did anybody see that error before?
> 
> 
> Regards,
> Bishoy
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Hadoop on Ceph error

2018-01-18 Thread Bishoy Mikhael
Hi All,

I've a tiny Ceph 12.2.2 cluster setup with three nodes, 17 OSDs, 3
MON,MDS,MGR (spanned across the three nodes).
Hadoop 2.7.3 is configured on only one of the three nodes as follows:
- Hadoop binaries was extracted to /opt/hadoop/bin/
- Hadoop config files where at /opt/hadoop/etc/hadoop/
- Hadoop-cephfs.jar was downloaded from http://download.ceph.com/tarballs/
to /opt/hadoop/lib/ which the last update to it was on 12-Mar-2013
- The following symbolic links have been done:
# ln -s /usr/lib64/libcephfs_jni.so.1.0.0 /usr/lib64/libcephfs_jni.so

# cp /usr/lib64/libcephfs_jni.so.1.0.0 /opt/hadoop/lib/native/

# ln -s /opt/hadoop/lib/native/libcephfs_jni.so.1.0.0
/opt/hadoop/lib/native/libcephfs_jni.so.1

# ln -s /opt/hadoop/lib/native/libcephfs_jni.so.1.0.0
/opt/hadoop/lib/native/libcephfs_jni.so

# ln -s /usr/share/java/libcephfs.jar /opt/hadoop/lib/
The following modification to Hadoop-config.sh has been done:

*/opt/hadoop/libexec/hadoop-config.sh*

# CLASSPATH initially contains $HADOOP_CONF_DIR

CLASSPATH="${HADOOP_CONF_DIR}:/opt/hadoop/lib/libcephfs.jar:/opt/hadoop/lib/hadoop-cephfs.jar"

So writes and reads to/from Ceph using HDFS CLI works fine, but when I use
hadoop Java library I get the following error:

ERROR HdfsTraveller:58 - com.ceph.fs.CephNotMountedException: not mounted

fileSystem.globStatus(path)
FileSystem.globStatus in hdfs api
ceph returns null pointer

Any idea what's going on? is it a configuration problem? is it a Ceph
problem? Did anybody see that error before?


Regards,
Bishoy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph df shows 100% used

2018-01-18 Thread Webert de Souza Lima
With the help of robbat2 and llua on IRC channel I was able to solve this
situation by taking down the 2-OSD only hosts.
After crush reweighting OSDs 8 and 23 from host mia1-master-fe02 to 0, ceph
df showed the expected storage capacity usage (about 70%)


With this in mind, those guys have told me that it is due the cluster
beeing uneven and unable to balance properly. It makes sense and it worked.
But for me it is still a very unexpected bahaviour for ceph to say that the
pools are 100% full and Available Space is 0.

There were 3 hosts and repl. size = 2, if the host with only 2 OSDs were
full (it wasn't), ceph could still use space from OSDs from the other hosts.

Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph df shows 100% used

2018-01-18 Thread Webert de Souza Lima
Hi David, thanks for replying.


On Thu, Jan 18, 2018 at 5:03 PM David Turner  wrote:

> You can have overall space available in your cluster because not all of
> your disks are in the same crush root.  You have multiple roots
> corresponding to multiple crush rulesets.  All pools using crush ruleset 0
> are full because all of the osds in that crush rule are full.
>


So I did check this. The usage of the OSDs that belonged to that root
(default) was about 60%.
All the pools using crush ruleset 0 were being show 100% there was only 1
near-full OSD in that crush rule. That's what is so weird about it.

On Thu, Jan 18, 2018 at 8:05 PM, David Turner  wrote:

> `ceph osd df` is a good command for you to see what's going on.  Compare
> the osd numbers with `ceph osd tree`.
>

I am sorry I forgot to send this output, here it is. I have added 2 OSDs to
that crush, borrowed them from the host mia1-master-ds05, to see if the
available space would higher, but it didn't.
So adding new OSDs to this didn't take any effect.

ceph osd df tree

ID  WEIGHT   REWEIGHT SIZE   USEAVAIL  %USE  VAR  PGS TYPE NAME
 -9 13.5- 14621G  2341G 12279G 16.02 0.31   0 root
databases
 -8  6.5-  7182G   835G  6346G 11.64 0.22   0 host
mia1-master-ds05
 20  3.0  1.0  3463G   380G  3082G 10.99 0.21 260
osd.20
 17  3.5  1.0  3719G   455G  3263G 12.24 0.24 286
osd.17
-10  7.0-  7438G  1505G  5932G 20.24 0.39   0 host
mia1-master-fe01
 21  3.5  1.0  3719G   714G  3004G 19.22 0.37 269
osd.21
 22  3.5  1.0  3719G   791G  2928G 21.27 0.41 295
osd.22
 -3  2.39996-  2830G  1647G  1182G 58.22 1.12   0 root
databases-ssd
 -5  1.19998-  1415G   823G   591G 58.22 1.12   0 host
mia1-master-ds02-ssd
 24  0.3  1.0   471G   278G   193G 58.96 1.14 173
osd.24
 25  0.3  1.0   471G   276G   194G 58.68 1.13 172
osd.25
 26  0.3  1.0   471G   269G   202G 57.03 1.10 167
osd.26
 -6  1.19998-  1415G   823G   591G 58.22 1.12   0 host
mia1-master-ds03-ssd
 27  0.3  1.0   471G   244G   227G 51.87 1.00 152
osd.27
 28  0.3  1.0   471G   281G   190G 59.63 1.15 175
osd.28
 29  0.3  1.0   471G   297G   173G 63.17 1.22 185
osd.29
 -1 71.69997- 76072G 44464G 31607G 58.45 1.13   0 root default
 -2 26.59998- 29575G 17334G 12240G 58.61 1.13   0 host
mia1-master-ds01
  0  3.2  1.0  3602G  1907G  1695G 52.94 1.02  90
osd.0
  1  3.2  1.0  3630G  2721G   908G 74.97 1.45 112
osd.1
  2  3.2  1.0  3723G  2373G  1349G 63.75 1.23  98
osd.2
  3  3.2  1.0  3723G  1781G  1941G 47.85 0.92 105
osd.3
  4  3.2  1.0  3723G  1880G  1843G 50.49 0.97  95
osd.4
  5  3.2  1.0  3723G  2465G  1257G 66.22 1.28 111
osd.5
  6  3.7  1.0  3723G  1722G  2001G 46.25 0.89 109
osd.6
  7  3.7  1.0  3723G  2481G  1241G 66.65 1.29 126
osd.7
 -4  8.5-  9311G  8540G   770G 91.72 1.77   0 host
mia1-master-fe02
  8  5.5  0.7  5587G  5419G   167G 97.00 1.87 189
osd.8
 23  3.0  1.0  3724G  3120G   603G 83.79 1.62 128
osd.23
 -7 29.5- 29747G 17821G 11926G 59.91 1.16   0 host
mia1-master-ds04
  9  3.7  1.0  3718G  2493G  1224G 67.07 1.29 114
osd.9
 10  3.7  1.0  3718G  2454G  1264G 66.00 1.27  90
osd.10
 11  3.7  1.0  3718G  2202G  1516G 59.22 1.14 116
osd.11
 12  3.7  1.0  3718G  2290G  1427G 61.61 1.19 113
osd.12
 13  3.7  1.0  3718G  2015G  1703G 54.19 1.05 112
osd.13
 14  3.7  1.0  3718G  1264G  2454G 34.00 0.66 101
osd.14
 15  3.7  1.0  3718G  2195G  1522G 59.05 1.14 104
osd.15
 16  3.7  1.0  3718G  2905G   813G 78.13 1.51 130
osd.16
-11  7.0-  7438G   768G  6669G 10.33 0.20   0 host
mia1-master-ds05-borrowed-osds
 18  3.5  1.0  3719G   393G  3325G 10.59 0.20 262
osd.18
 19  3.5  1.0  3719G   374G  3344G 10.07 0.19 256
osd.19
TOTAL 93524G 48454G 45069G 51.81
MIN/MAX VAR: 0.19/1.87  STDDEV: 22.02



Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*

On Thu, Jan 18, 2018 at 8:05 PM, David Turner  wrote:

> `ceph osd df` is a good command for you to see what's going on.  Compare
> the osd numbers with `ceph osd tree`.
>
>
>>
>> On Thu, Jan 18, 2018 at 3:34 PM Webert de Souza Lima <
>> webert.b...@gmail.com> wrote:
>>
>>> Sorry I forgot, this is a ceph jewel 10.2.10
>>>
>>>
>>> Regards,
>>>
>>> Webert Lima
>>> DevOps Engineer at MAV Tecnologia
>>> *Belo Horizonte - Brasil*
>>> *IRC NICK - WebertRLZ*
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] ceph df shows 100% used

2018-01-18 Thread David Turner
You hosts are also not balanced in your default root.  Your failure domain
is host, but one of your hosts has 8.5TB of storage in it compared to
26.6TB and 29.6TB.  You only have size=2 (along with min_size=1 which is
bad for a lot of reasons) so it should still be able to place data mostly
between ds01 and ds04 and ignore fe02 since it doesn't have much space at
all.  Anyway, `ceph osd df` will be good output to see what the
distribution between osds looks like.

 -1 64.69997 root default
 -2 26.59998 host mia1-master-ds01
  0  3.2 osd.0  up  1.0  1.0
  1  3.2 osd.1  up  1.0  1.0
  2  3.2 osd.2  up  1.0  1.0
  3  3.2 osd.3  up  1.0  1.0
  4  3.2 osd.4  up  1.0  1.0
  5  3.2 osd.5  up  1.0  1.0
  6  3.7 osd.6  up  1.0  1.0
  7  3.7 osd.7  up  1.0  1.0
 -4  8.5 host mia1-master-fe02
  8  5.5 osd.8  up  1.0  1.0
 23  3.0 osd.23 up  1.0  1.0
 -7 29.5 host mia1-master-ds04
  9  3.7 osd.9  up  1.0  1.0
 10  3.7 osd.10 up  1.0  1.0
 11  3.7 osd.11 up  1.0  1.0
 12  3.7 osd.12 up  1.0  1.0
 13  3.7 osd.13 up  1.0  1.0
 14  3.7 osd.14 up  1.0  1.0
 15  3.7 osd.15 up  1.0  1.0
 16  3.7 osd.16 up  1.0  1.0



On Thu, Jan 18, 2018 at 5:05 PM David Turner  wrote:

> `ceph osd df` is a good command for you to see what's going on.  Compare
> the osd numbers with `ceph osd tree`.
>
> On Thu, Jan 18, 2018 at 5:03 PM David Turner 
> wrote:
>
>> You can have overall space available in your cluster because not all of
>> your disks are in the same crush root.  You have multiple roots
>> corresponding to multiple crush rulesets.  All pools using crush ruleset 0
>> are full because all of the osds in that crush rule are full.
>>
>> On Thu, Jan 18, 2018 at 3:34 PM Webert de Souza Lima <
>> webert.b...@gmail.com> wrote:
>>
>>> Sorry I forgot, this is a ceph jewel 10.2.10
>>>
>>>
>>> Regards,
>>>
>>> Webert Lima
>>> DevOps Engineer at MAV Tecnologia
>>> *Belo Horizonte - Brasil*
>>> *IRC NICK - WebertRLZ*
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph df shows 100% used

2018-01-18 Thread David Turner
`ceph osd df` is a good command for you to see what's going on.  Compare
the osd numbers with `ceph osd tree`.

On Thu, Jan 18, 2018 at 5:03 PM David Turner  wrote:

> You can have overall space available in your cluster because not all of
> your disks are in the same crush root.  You have multiple roots
> corresponding to multiple crush rulesets.  All pools using crush ruleset 0
> are full because all of the osds in that crush rule are full.
>
> On Thu, Jan 18, 2018 at 3:34 PM Webert de Souza Lima <
> webert.b...@gmail.com> wrote:
>
>> Sorry I forgot, this is a ceph jewel 10.2.10
>>
>>
>> Regards,
>>
>> Webert Lima
>> DevOps Engineer at MAV Tecnologia
>> *Belo Horizonte - Brasil*
>> *IRC NICK - WebertRLZ*
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph df shows 100% used

2018-01-18 Thread David Turner
You can have overall space available in your cluster because not all of
your disks are in the same crush root.  You have multiple roots
corresponding to multiple crush rulesets.  All pools using crush ruleset 0
are full because all of the osds in that crush rule are full.

On Thu, Jan 18, 2018 at 3:34 PM Webert de Souza Lima 
wrote:

> Sorry I forgot, this is a ceph jewel 10.2.10
>
>
> Regards,
>
> Webert Lima
> DevOps Engineer at MAV Tecnologia
> *Belo Horizonte - Brasil*
> *IRC NICK - WebertRLZ*
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph luminous - cannot assign requested address

2018-01-18 Thread Stefan Kooman
Quoting Steven Vacaroaia (ste...@gmail.com):
> Hi,
> 
> I have noticed the below error message when creating a new OSD using
> ceph-volume
> deleting the OSD and recreating it does not work - same error message
> 
> However, creating a new one OSD works
> 
> Note
> No firewall /iptables are enabled and nothing shows on those ports using
> netstat -ant
> 
> Any ideas ?
> 
> osd01.tor.medavail.net ceph-osd[3059]: starting osd.2 at - osd_data
> /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
> Jan 18 09:47:25 osd01.tor.medavail.net ceph-osd[3059]: 2018-01-18
> 09:47:25.219938 7fa385ef3d00 -1  Processor -- bind unable to bind to
> 10.10.30.183:7300/0 on any port in range 6800-7300: (99) Cannot assign
> requested address
> Jan 18 09:47:25 osd01.tor.medavail.net ceph-osd[3059]: 2018-01-18
> 09:47:25.219969 7fa385ef3d00 -1  Processor -- bind was unable to bind.
> Trying again in 5 seconds
> Jan 18 09:47:30 osd01.tor.medavail.net ceph-osd[3059]: 2018-01-18
> 09:47:30.227718 7fa385ef3d00 -1  Processor -- bind unable to bind to
> 10.10.30.183:7300/0 on any port in range 6800-7300: (99) Cannot assign
> requested address

osd01.tor.medavail.net <- does the ip-adress of this host in the dns
/etc/hosts file correspond to the ip bound on the host?

Gr. Stefan


-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Two datacenter resilient design with a quorum site

2018-01-18 Thread Gregory Farnum
On Thu, Jan 18, 2018 at 5:57 AM, Alex Gorbachev  
wrote:
> On Tue, Jan 16, 2018 at 2:17 PM, Gregory Farnum  wrote:
>> On Tue, Jan 16, 2018 at 6:07 AM Alex Gorbachev 
>> wrote:
>>>
>>> I found a few WAN RBD cluster design discussions, but not a local one,
>>> so was wonderinng if anyone has experience with a resilience-oriented
>>> short distance (<10 km, redundant fiber connections) cluster in two
>>> datacenters with a third site for quorum purposes only?
>>>
>>> I can see two types of scenarios:
>>>
>>> 1. Two (or even number) of OSD nodes at each site, 4x replication
>>> (size 4, min_size 2).  Three MONs, one at each site to handle split
>>> brain.
>>>
>>> Question: How does the cluster handle the loss of communication
>>> between the OSD sites A and B, while both can communicate with the
>>> quorum site C?  It seems, one of the sites should suspend, as OSDs
>>> will not be able to communicate between sites.
>>
>>
>> Sadly this won't work — the OSDs on each side will report their peers on the
>> other side down, but both will be able to connect to a live monitor.
>> (Assuming the quorum site holds the leader monitor, anyway — if one of the
>> main sites holds what should be the leader, you'll get into a monitor
>> election storm instead.) You'll need your own netsplit monitoring to shut
>> down one site if that kind of connection cut is a possibility.
>
> What about running a split brain aware too, such as Pacemaker, and
> running a copy of the same VM as a mon at each site?  In case of a
> split brain network separation, Pacemaker would (aware via third site)
> stop the mon on site A and bring up the mon on site B (or whatever the
> rules are set to).  I read earlier that a mon with the same IP, name
> and keyring would just look to the ceph cluster as a very old mon, but
> still able to vote for quorum.

It probably is, but don't do that: just use your network monitoring to
shut down the site you've decided is less important. No need to try
and replace its monitor on the primary site or anything like that. (It
would leave you with a mess when trying to restore the secondary
site!)
If you're worried about handling an additional monitor failures, you
can do two per site (plus quorum tiebreaker).
-Greg

>
> Vincent Godin also described an HSRP based method, which would
> accomplish this goal via network routing.  That seems like a good
> approach too, I just need to check on HSRP availability.
>
>>
>>>
>>>
>>> 2. 3x replication for performance or cost (size 3, min_size 2 - or
>>> even min_size 1 and strict monitoring).  Two replicas and two MONs at
>>> one site and one replica and one MON at the other site.
>>>
>>> Question: in case of a permanent failure of the main site (with two
>>> replicas), how to manually force the other site (with one replica and
>>> MON) to provide storage?  I would think a CRUSH map change and
>>> modifying ceph.conf to include just one MON, then build two more MONs
>>> locally and add?
>>
>>
>> Yep, pretty much that. You won't need to change ceph.conf to just one mon so
>> much as to include the current set of mons and update the monmap. I believe
>> that process is in the disaster recovery section of the docs.
>
> Thank you.
>
> Alex
>
>> -Greg
>>
>>>
>>>
>>> --
>>> Alex Gorbachev
>>> Storcium
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] also having a slow monitor join quorum

2018-01-18 Thread Marc Roos
 
Took around 30min for the monitor join and I could execute ceph -s



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] also having a slow monitor join quorum

2018-01-18 Thread Marc Roos
 

I have seen messages pass by here, on when a monitor tries to join it 
takes a while. I had the monitor disk run out of space. Monitor was 
killed and now restarting it. I can't do a ceph -s and have to wait for 
this monitor to join also. 



2018-01-18 21:34:05.787749 7f5187a40700  0 -- 192.168.10.111:0/12930 >> 
192.168.10.112:6810/2033 conn(0x558120ab1800 :-1 
s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 
l=0).handle_connect_reply connect got BADAUTHORIZER
2018-01-18 21:34:20.788612 7f5187a40700  0 -- 192.168.10.111:0/12930 >> 
192.168.10.112:6810/2033 conn(0x558120ab1800 :-1 
s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 
l=0).handle_connect_reply connect got BADAUTHORIZER
2018-01-18 21:34:20.788739 7f5187a40700  0 -- 192.168.10.111:0/12930 >> 
192.168.10.112:6810/2033 conn(0x558120ab1800 :-1 
s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 
l=0).handle_connect_reply connect got BADAUTHORIZER
2018-01-18 21:34:35.789475 7f5187a40700  0 -- 192.168.10.111:0/12930 >> 
192.168.10.112:6810/2033 conn(0x558120ab1800 :-1 
s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 
l=0).handle_connect_reply connect got BADAUTHORIZER
2018-01-18 21:34:35.789608 7f5187a40700  0 -- 192.168.10.111:0/12930 >> 
192.168.10.112:6810/2033 conn(0x558120ab1800 :-1 
s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 
l=0).handle_connect_reply connect got BADAUTHORIZER
2018-01-18 21:34:40.333203 7f518d24b700  0 
mon.a@0(synchronizing).data_health(0) update_stats avail 47% total 5990 
MB, used 3124 MB, avail 2865 MB


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph df shows 100% used

2018-01-18 Thread Webert de Souza Lima
Sorry I forgot, this is a ceph jewel 10.2.10


Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph df shows 100% used

2018-01-18 Thread Webert de Souza Lima
Also, there is no quota set for the pools

Here is "ceph osd pool get xxx all": http://termbin.com/ix0n


Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph df shows 100% used

2018-01-18 Thread Webert de Souza Lima
Hello,

I'm running near-out-of service radosgw (very slow to write new objects)
and I suspect it's because of ceph df is showing 100% usage in some pools,
though I don't know what that information comes from.

Pools:
#~ ceph osd pool ls detail  -> http://termbin.com/lsd0

Crush Rules (important is rule 0)
~# ceph osd crush rule dump ->  http://termbin.com/wkpo

OSD Tree:
~# ceph osd tree -> http://termbin.com/87vt

Ceph DF, which shows 100% Usage:
~# ceph df detail -> http://termbin.com/15mz

Ceph Status, which shows 45600 GB / 93524 GB avail:
~# ceph -s -> http://termbin.com/wycq


Any thoughts?

Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] After Luminous upgrade: ceph-fuse clients failing to respond to cache pressure

2018-01-18 Thread Patrick Donnelly
Hi Andras,

On Thu, Jan 18, 2018 at 3:38 AM, Andras Pataki
 wrote:
> Hi John,
>
> Some other symptoms of the problem:  when the MDS has been running for a few
> days, it starts looking really busy.  At this time, listing directories
> becomes really slow.  An "ls -l" on a directory with about 250 entries takes
> about 2.5 seconds.  All the metadata is on OSDs with NVMe backing stores.
> Interestingly enough the memory usage seems pretty low (compared to the
> allowed cache limit).
>
>
> PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
> COMMAND
> 1604408 ceph  20   0 3710304 2.387g  18360 S 100.0  0.9 757:06.92
> /usr/bin/ceph-mds -f --cluster ceph --id cephmon00 --setuser ceph --setgroup
> ceph
>
> Once I bounce it (fail it over), the CPU usage goes down to the 10-25%
> range.  The same ls -l after the bounce takes about 0.5 seconds.  I
> remounted the filesystem before each test to ensure there isn't anything
> cached.
>
> PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
> COMMAND
>   00 ceph  20   0 6537052 5.864g  18500 S  17.6  2.3   9:23.55
> /usr/bin/ceph-mds -f --cluster ceph --id cephmon02 --setuser ceph --setgroup
> ceph
>
> Also, I have a crawler that crawls the file system periodically.  Normally
> the full crawl runs for about 24 hours, but with the slowing down MDS, now
> it has been running for more than 2 days and isn't close to finishing.
>
> The MDS related settings we are running with are:
>
> mds_cache_memory_limit = 17179869184
> mds_cache_reservation = 0.10

Debug logs from the MDS at that time would be helpful with `debug mds
= 20` and `debug ms = 1`. Feel free to create a tracker ticket and use
ceph-post-file [1] to share logs.

[1] http://docs.ceph.com/docs/hammer/man/8/ceph-post-file/

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph luminous - cannot assign requested address

2018-01-18 Thread Steven Vacaroaia
Hi,

I have noticed the below error message when creating a new OSD using
ceph-volume
deleting the OSD and recreating it does not work - same error message

However, creating a new one OSD works

Note
No firewall /iptables are enabled and nothing shows on those ports using
netstat -ant

Any ideas ?

osd01.tor.medavail.net ceph-osd[3059]: starting osd.2 at - osd_data
/var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
Jan 18 09:47:25 osd01.tor.medavail.net ceph-osd[3059]: 2018-01-18
09:47:25.219938 7fa385ef3d00 -1  Processor -- bind unable to bind to
10.10.30.183:7300/0 on any port in range 6800-7300: (99) Cannot assign
requested address
Jan 18 09:47:25 osd01.tor.medavail.net ceph-osd[3059]: 2018-01-18
09:47:25.219969 7fa385ef3d00 -1  Processor -- bind was unable to bind.
Trying again in 5 seconds
Jan 18 09:47:30 osd01.tor.medavail.net ceph-osd[3059]: 2018-01-18
09:47:30.227718 7fa385ef3d00 -1  Processor -- bind unable to bind to
10.10.30.183:7300/0 on any port in range 6800-7300: (99) Cannot assign
requested address


Thanks
Steven
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Two datacenter resilient design with a quorum site

2018-01-18 Thread Alex Gorbachev
On Tue, Jan 16, 2018 at 2:17 PM, Gregory Farnum  wrote:
> On Tue, Jan 16, 2018 at 6:07 AM Alex Gorbachev 
> wrote:
>>
>> I found a few WAN RBD cluster design discussions, but not a local one,
>> so was wonderinng if anyone has experience with a resilience-oriented
>> short distance (<10 km, redundant fiber connections) cluster in two
>> datacenters with a third site for quorum purposes only?
>>
>> I can see two types of scenarios:
>>
>> 1. Two (or even number) of OSD nodes at each site, 4x replication
>> (size 4, min_size 2).  Three MONs, one at each site to handle split
>> brain.
>>
>> Question: How does the cluster handle the loss of communication
>> between the OSD sites A and B, while both can communicate with the
>> quorum site C?  It seems, one of the sites should suspend, as OSDs
>> will not be able to communicate between sites.
>
>
> Sadly this won't work — the OSDs on each side will report their peers on the
> other side down, but both will be able to connect to a live monitor.
> (Assuming the quorum site holds the leader monitor, anyway — if one of the
> main sites holds what should be the leader, you'll get into a monitor
> election storm instead.) You'll need your own netsplit monitoring to shut
> down one site if that kind of connection cut is a possibility.

What about running a split brain aware too, such as Pacemaker, and
running a copy of the same VM as a mon at each site?  In case of a
split brain network separation, Pacemaker would (aware via third site)
stop the mon on site A and bring up the mon on site B (or whatever the
rules are set to).  I read earlier that a mon with the same IP, name
and keyring would just look to the ceph cluster as a very old mon, but
still able to vote for quorum.

Vincent Godin also described an HSRP based method, which would
accomplish this goal via network routing.  That seems like a good
approach too, I just need to check on HSRP availability.

>
>>
>>
>> 2. 3x replication for performance or cost (size 3, min_size 2 - or
>> even min_size 1 and strict monitoring).  Two replicas and two MONs at
>> one site and one replica and one MON at the other site.
>>
>> Question: in case of a permanent failure of the main site (with two
>> replicas), how to manually force the other site (with one replica and
>> MON) to provide storage?  I would think a CRUSH map change and
>> modifying ceph.conf to include just one MON, then build two more MONs
>> locally and add?
>
>
> Yep, pretty much that. You won't need to change ceph.conf to just one mon so
> much as to include the current set of mons and update the monmap. I believe
> that process is in the disaster recovery section of the docs.

Thank you.

Alex

> -Greg
>
>>
>>
>> --
>> Alex Gorbachev
>> Storcium
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to use create an new radosgw user using RESTful API?

2018-01-18 Thread Valery Tschopp

You have to check the admin Ops API documentation:

http://docs.ceph.com/docs/master/radosgw/adminops/

Cheers,
Valery

On 18/01/18 12:32 , 13605702...@163.com wrote:

hi:
     is there a way to create radosgw user using RESTful API ?
     i'm using Jewel.

thanks


13605702...@163.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
SWITCH
Valéry Tschopp, Software Engineer
Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
email: valery.tsch...@switch.ch phone: +41 44 268 1544

30 years of pioneering the Swiss Internet.
Celebrate with us at https://swit.ch/30years

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] After Luminous upgrade: ceph-fuse clients failing to respond to cache pressure

2018-01-18 Thread Andras Pataki

Hi John,

Some other symptoms of the problem:  when the MDS has been running for a 
few days, it starts looking really busy.  At this time, listing 
directories becomes really slow.  An "ls -l" on a directory with about 
250 entries takes about 2.5 seconds.  All the metadata is on OSDs with 
NVMe backing stores.  Interestingly enough the memory usage seems pretty 
low (compared to the allowed cache limit).



    PID USER  PR  NI    VIRT    RES    SHR S  %CPU %MEM TIME+ COMMAND
1604408 ceph  20   0 3710304 2.387g  18360 S 100.0  0.9 757:06.92 
/usr/bin/ceph-mds -f --cluster ceph --id cephmon00 --setuser ceph 
--setgroup ceph


Once I bounce it (fail it over), the CPU usage goes down to the 10-25% 
range.  The same ls -l after the bounce takes about 0.5 seconds.  I 
remounted the filesystem before each test to ensure there isn't anything 
cached.


    PID USER  PR  NI    VIRT    RES    SHR S  %CPU %MEM TIME+ COMMAND
00 ceph  20   0 6537052 5.864g  18500 S  17.6 2.3   9:23.55 
/usr/bin/ceph-mds -f --cluster ceph --id cephmon02 --setuser ceph 
--setgroup ceph


Also, I have a crawler that crawls the file system periodically. 
Normally the full crawl runs for about 24 hours, but with the slowing 
down MDS, now it has been running for more than 2 days and isn't close 
to finishing.


The MDS related settings we are running with are:

   mds_cache_memory_limit = 17179869184
   mds_cache_reservation = 0.10


Andras


On 01/17/2018 01:11 PM, John Spray wrote:

On Wed, Jan 17, 2018 at 3:36 PM, Andras Pataki
 wrote:

Hi John,

All our hosts are CentOS 7 hosts, the majority are 7.4 with kernel
3.10.0-693.5.2.el7.x86_64, with fuse 2.9.2-8.el7.  We have some hosts that
have slight variations in kernel versions, the oldest one are a handful of
CentOS 7.3 hosts with kernel 3.10.0-514.21.1.el7.x86_64 and fuse
2.9.2-7.el7.  I know Redhat has been backporting lots of stuff so perhaps
these kernels fall into the category you are describing?

Quite possibly -- this issue was originally noticed on RHEL, so maybe
the relevant bits made it back to CentOS recently.

However, it looks like the fixes for that issue[1,2] are already in
12.2.2, so maybe this is something completely unrelated :-/

The ceph-fuse executable does create an admin command socket in
/var/run/ceph (named something ceph-client...) that you can drive with
"ceph daemon  dump_cache", but the output is extremely verbose
and low level and may not be informative.

John

1. http://tracker.ceph.com/issues/21423
2. http://tracker.ceph.com/issues/22269


When the cache pressure problem happens, is there a way to know exactly
which hosts are involved, and what items are in their caches easily?

Andras



On 01/17/2018 06:09 AM, John Spray wrote:

On Tue, Jan 16, 2018 at 8:50 PM, Andras Pataki
 wrote:

Dear Cephers,

We've upgraded the back end of our cluster from Jewel (10.2.10) to
Luminous
(12.2.2).  The upgrade went smoothly for the most part, except we seem to
be
hitting an issue with cephfs.  After about a day or two of use, the MDS
start complaining about clients failing to respond to cache pressure:

What's the OS, kernel version and fuse version on the hosts where the
clients are running?

There have been some issues with ceph-fuse losing the ability to
properly invalidate cached items when certain updated OS packages were
installed.

Specifically, ceph-fuse checks the kernel version against 3.18.0 to
decide which invalidation method to use, and if your OS has backported
new behaviour to a low-version-numbered kernel, that can confuse it.

John


[root@cephmon00 ~]# ceph -s
cluster:
  id: d7b33135-0940-4e48-8aa6-1d2026597c2f
  health: HEALTH_WARN
  1 MDSs have many clients failing to respond to cache
pressure
  noout flag(s) set
  1 osds down

services:
  mon: 3 daemons, quorum cephmon00,cephmon01,cephmon02
  mgr: cephmon00(active), standbys: cephmon01, cephmon02
  mds: cephfs-1/1/1 up  {0=cephmon00=up:active}, 2 up:standby
  osd: 2208 osds: 2207 up, 2208 in
   flags noout

data:
  pools:   6 pools, 42496 pgs
  objects: 919M objects, 3062 TB
  usage:   9203 TB used, 4618 TB / 13822 TB avail
  pgs: 42470 active+clean
   22active+clean+scrubbing+deep
   4 active+clean+scrubbing

io:
  client:   56122 kB/s rd, 18397 kB/s wr, 84 op/s rd, 101 op/s wr

[root@cephmon00 ~]# ceph health detail
HEALTH_WARN 1 MDSs have many clients failing to respond to cache
pressure;
noout flag(s) set; 1 osds down
MDS_CLIENT_RECALL_MANY 1 MDSs have many clients failing to respond to
cache
pressure
  mdscephmon00(mds.0): Many clients (103) failing to respond to cache
pressureclient_count: 103
OSDMAP_FLAGS noout flag(s) set
OSD_DOWN 1 osds down
  osd.1296 (root=root-disk,pod=pod0-disk,host=cephosd008-disk) is down


We are using exclusively the 12.2.2 fuse 

[ceph-users] how to use create an new radosgw user using RESTful API?

2018-01-18 Thread 13605702...@163.com
hi:
is there a way to create radosgw user using  RESTful API ? 
i'm using Jewel.

thanks



13605702...@163.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace failed disk in Luminous v12.2.2

2018-01-18 Thread Dietmar Rieder
Hi,

I finally found a working way to replace the failed OSD. Everthing looks
fine again.

Thanks again for your comments and suggestions.

Dietmar

On 01/12/2018 04:08 PM, Dietmar Rieder wrote:
> Hi,
> 
> can someone, comment/confirm my planned OSD replacement procedure?
> 
> It would be very helpful for me.
> 
> Dietmar
> 
> Am 11. Januar 2018 17:47:50 MEZ schrieb Dietmar Rieder
> :
> 
> Hi Alfredo,
> 
> thanks for your coments, see my answers inline.
> 
> On 01/11/2018 01:47 PM, Alfredo Deza wrote:
> 
> On Thu, Jan 11, 2018 at 4:30 AM, Dietmar Rieder
>  wrote:
> 
> Hello,
> 
> we have failed OSD disk in our Luminous v12.2.2 cluster that
> needs to
> get replaced.
> 
> The cluster was initially deployed using ceph-deploy on Luminous
> v12.2.0. The OSDs were created using
> 
> ceph-deploy osd create --bluestore cephosd-${osd}:/dev/sd${disk}
> --block-wal /dev/nvme0n1 --block-db /dev/nvme0n1
> 
> Note we separated the bluestore data, wal and db.
> 
> We updated to Luminous v12.2.1 and further to Luminous v12.2.2.
> 
> With the last update we also let ceph-volume take over the
> OSDs using
> "ceph-volume simple scan /var/lib/ceph/osd/$osd" and
> "ceph-volume
> simple activate ${osd} ${id}". All of this went smoothly.
> 
> 
> That is good to hear!
> 
> 
> Now wonder what is the correct way to replace a failed OSD
> block disk?
> 
> The docs for luminous [1] say:
> 
> REPLACING AN OSD
> 
> 1. Destroy the OSD first:
> 
> ceph osd destroy {id} --yes-i-really-mean-it
> 
> 2. Zap a disk for the new OSD, if the disk was used before
> for other
> purposes. It’s not necessary for a new disk:
> 
> ceph-disk zap /dev/sdX
> 
> 
> 3. Prepare the disk for replacement by using the previously
> destroyed
> OSD id:
> 
> ceph-disk prepare --bluestore /dev/sdX --osd-id {id}
> --osd-uuid `uuidgen`
> 
> 
> 4. And activate the OSD:
> 
> ceph-disk activate /dev/sdX1
> 
> 
> Initially this seems to be straight forward, but
> 
> 1. I'm not sure if there is something to do with the still
> existing
> bluefs db and wal partitions on the nvme device for the
> failed OSD. Do
> they have to be zapped ? If yes, what is the best way? There
> is nothing
> mentioned in the docs.
> 
> 
> What is your concern here if the activation seems to work?
> 
> 
> I geuss on the nvme partitions for bluefs db and bluefs wal there is
> still data related to the failed OSD  block device. I was thinking that
> this data might "interfere" with the new replacement OSD block device,
> which is empty.
> 
> So you are saying that this is no concern, right?
> Are they automatically reused and assigned to the replacement OSD block
> device, or do I have to specify them when running ceph-disk prepare?
> If I need to specify the wal and db partition, how is this done?
> 
> I'm asking this since from the logs of the initial cluster deployment I
> got the following warning:
> 
> [cephosd-02][WARNING] prepare_device: OSD will not be hot-swappable if
> block.db is not the same device as the osd data
> [...]
> [cephosd-02][WARNING] prepare_device: OSD will not be hot-swappable if
> block.wal is not the same device as the osd data
> 
> 
> 
> 2. Since we already let "ceph-volume simple" take over our
> OSDs I'm not
> sure if we should now use ceph-volume or again ceph-disk
> (followed by
> "ceph-vloume simple" takeover) to prepare and activate the OSD?
> 
> 
> The `simple` sub-command is meant to help with the activation of
> OSDs
> at boot time, supporting ceph-disk (or manual) created OSDs.
> 
> 
> OK, got this...
> 
> 
> There is no requirement to use `ceph-volume lvm` which is
> intended for
> new OSDs using LVM as devices.
> 
> 
> Fine...
> 
> 
> 3. If we should use ceph-volume, then by looking at the luminous
> ceph-volume docs [2] I find for both,
> 
> ceph-volume lvm prepare
> ceph-volume lvm activate
> 
> that the bluestore option is either NOT implemented or NOT
> supported
> 
> activate: [–bluestore] filestore (IS THIS A TYPO???)
> objectstore (not
> yet implemented)
> prepare: [–bluestore] Use the bluestore objectstore (not
> currently
> supported)
> 
> 
> These