Re: [ceph-users] Ceph cache pool full

2017-10-05 Thread Christian Wuerdig
The default filesize limit for CephFS is 1TB, see also here:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-May/018208.html
(also includes a pointer on how to increase it)

On Fri, Oct 6, 2017 at 12:45 PM, Shawfeng Dong  wrote:
> Dear all,
>
> We just set up a Ceph cluster, running the latest stable release Ceph
> v12.2.0 (Luminous):
> # ceph --version
> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
>
> The goal is to serve Ceph filesystem, for which we created 3 pools:
> # ceph osd lspools
> 1 cephfs_data,2 cephfs_metadata,3 cephfs_cache,
> where
> * cephfs_data is the data pool (36 OSDs on HDDs), which is erased-coded;
> * cephfs_metadata is the metadata pool
> * cephfs_cache is the cache tier (3 OSDs on NVMes) for cephfs_data. The
> cache-mode is writeback.
>
> Everything had worked fine, until today when we tried to copy a 1.3TB file
> to the CephFS.  We got the "No space left on device" error!
>
> 'ceph -s' says some OSDs are full:
> # ceph -s
>   cluster:
> id: e18516bf-39cb-4670-9f13-88ccb7d19769
> health: HEALTH_ERR
> full flag(s) set
> 1 full osd(s)
> 1 pools have many more objects per pg than average
>
>   services:
> mon: 3 daemons, quorum pulpo-admin,pulpo-mon01,pulpo-mds01
> mgr: pulpo-mds01(active), standbys: pulpo-admin, pulpo-mon01
> mds: pulpos-1/1/1 up  {0=pulpo-mds01=up:active}
> osd: 39 osds: 39 up, 39 in
>  flags full
>
>   data:
> pools:   3 pools, 2176 pgs
> objects: 347k objects, 1381 GB
> usage:   2847 GB used, 262 TB / 265 TB avail
> pgs: 2176 active+clean
>
>   io:
> client:   19301 kB/s rd, 2935 op/s rd, 0 op/s wr
>
> And indeed the cache pool is full:
> # rados df
> POOL_NAME   USED  OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND
> DEGRADED RD_OPS   RD
> WR_OPS  WR
> cephfs_cache1381G  355385  0 710770  0   0
> 0 10004954 15
> 22G 1398063  1611G
> cephfs_data 0   0  0  0  0   0
> 00
>   0   0  0
> cephfs_metadata 8515k  24  0 72  0   0
> 03  3
> 0723953 10541k
>
> total_objects355409
> total_used   2847G
> total_avail  262T
> total_space  265T
>
> However, the data pool is completely empty! So it seems that data has only
> been written to the cache pool, but not written back to the data pool.
>
> I am really at a loss whether this is due to a setup error on my part, or a
> Luminous bug. Could anyone shed some light on this? Please let me know if
> you need any further info.
>
> Best,
> Shaw
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph cache pool full

2017-10-05 Thread Christian Balzer

Hello,

On Fri, 06 Oct 2017 03:30:41 + David Turner wrote:

> You're missing most all of the important bits. What the osds in your
> cluster look like, your tree, and your cache pool settings.
> 
> ceph df
> ceph osd df
> ceph osd tree
> ceph osd pool get cephfs_cache all
>
Especially the last one.

My money is on not having set target_max_objects and target_max_bytes to
sensible values along with the ratios.
In short, not having read the (albeit spotty) documentation.
 
> You have your writeback cache on 3 nvme drives. It looks like you have
> 1.6TB available between them for the cache. I don't know the behavior of a
> writeback cache tier on cephfs for large files, but I would guess that it
> can only hold full files and not flush partial files. 

I VERY much doubt that, if so it would be a massive flaw.
One assumes that cache operations work on the RADOS object level, no
matter what.

> That would mean your
> cache needs to have enough space for any file being written to the cluster.
> In this case a 1.3TB file with 3x replication would require 3.9TB (more
> than double what you have available) of available space in your writeback
> cache.
> 
> There are very few use cases that benefit from a cache tier. The docs for
> Luminous warn as much. 
You keep repeating that like a broken record.

And while certainly not false I for one wouldn't be able to use (justify
using) Ceph w/o cache tiers in our main use case.

In this case I assume they were following and old cheat sheet or such,
suggesting the previously required cache tier with EC pools.

Christian

>What is your goal by implementing this cache? If the
> answer is to utilize extra space on the nvmes, then just remove it and say
> thank you. The better use of nvmes in that case are as a part of the
> bluestore stack and give your osds larger DB partitions. Keeping your
> metadata pool on nvmes is still a good idea.
> 
> On Thu, Oct 5, 2017, 7:45 PM Shawfeng Dong  wrote:
> 
> > Dear all,
> >
> > We just set up a Ceph cluster, running the latest stable release Ceph
> > v12.2.0 (Luminous):
> > # ceph --version
> > ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous
> > (rc)
> >
> > The goal is to serve Ceph filesystem, for which we created 3 pools:
> > # ceph osd lspools
> > 1 cephfs_data,2 cephfs_metadata,3 cephfs_cache,
> > where
> > * cephfs_data is the data pool (36 OSDs on HDDs), which is erased-coded;
> > * cephfs_metadata is the metadata pool
> > * cephfs_cache is the cache tier (3 OSDs on NVMes) for cephfs_data. The
> > cache-mode is writeback.
> >
> > Everything had worked fine, until today when we tried to copy a 1.3TB file
> > to the CephFS.  We got the "No space left on device" error!
> >
> > 'ceph -s' says some OSDs are full:
> > # ceph -s
> >   cluster:
> > id: e18516bf-39cb-4670-9f13-88ccb7d19769
> > health: HEALTH_ERR
> > full flag(s) set
> > 1 full osd(s)
> > 1 pools have many more objects per pg than average
> >
> >   services:
> > mon: 3 daemons, quorum pulpo-admin,pulpo-mon01,pulpo-mds01
> > mgr: pulpo-mds01(active), standbys: pulpo-admin, pulpo-mon01
> > mds: pulpos-1/1/1 up  {0=pulpo-mds01=up:active}
> > osd: 39 osds: 39 up, 39 in
> >  flags full
> >
> >   data:
> > pools:   3 pools, 2176 pgs
> > objects: 347k objects, 1381 GB
> > usage:   2847 GB used, 262 TB / 265 TB avail
> > pgs: 2176 active+clean
> >
> >   io:
> > client:   19301 kB/s rd, 2935 op/s rd, 0 op/s wr
> >
> > And indeed the cache pool is full:
> > # rados df
> > POOL_NAME   USED  OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND
> > DEGRADED RD_OPS   RD
> > WR_OPS  WR
> > cephfs_cache1381G  355385  0 710770  0   0
> > 0 10004954 15
> > 22G 1398063  1611G
> > cephfs_data 0   0  0  0  0   0
> > 00
> >   0   0  0
> > cephfs_metadata 8515k  24  0 72  0   0
> > 03  3
> > 0723953 10541k
> >
> > total_objects355409
> > total_used   2847G
> > total_avail  262T
> > total_space  265T
> >
> > However, the data pool is completely empty! So it seems that data has only
> > been written to the cache pool, but not written back to the data pool.
> >
> > I am really at a loss whether this is due to a setup error on my part, or
> > a Luminous bug. Could anyone shed some light on this? Please let me know if
> > you need any further info.
> >
> > Best,
> > Shaw
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >  


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] Ceph cache pool full

2017-10-05 Thread David Turner
You're missing most all of the important bits. What the osds in your
cluster look like, your tree, and your cache pool settings.

ceph df
ceph osd df
ceph osd tree
ceph osd pool get cephfs_cache all

You have your writeback cache on 3 nvme drives. It looks like you have
1.6TB available between them for the cache. I don't know the behavior of a
writeback cache tier on cephfs for large files, but I would guess that it
can only hold full files and not flush partial files. That would mean your
cache needs to have enough space for any file being written to the cluster.
In this case a 1.3TB file with 3x replication would require 3.9TB (more
than double what you have available) of available space in your writeback
cache.

There are very few use cases that benefit from a cache tier. The docs for
Luminous warn as much. What is your goal by implementing this cache? If the
answer is to utilize extra space on the nvmes, then just remove it and say
thank you. The better use of nvmes in that case are as a part of the
bluestore stack and give your osds larger DB partitions. Keeping your
metadata pool on nvmes is still a good idea.

On Thu, Oct 5, 2017, 7:45 PM Shawfeng Dong  wrote:

> Dear all,
>
> We just set up a Ceph cluster, running the latest stable release Ceph
> v12.2.0 (Luminous):
> # ceph --version
> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous
> (rc)
>
> The goal is to serve Ceph filesystem, for which we created 3 pools:
> # ceph osd lspools
> 1 cephfs_data,2 cephfs_metadata,3 cephfs_cache,
> where
> * cephfs_data is the data pool (36 OSDs on HDDs), which is erased-coded;
> * cephfs_metadata is the metadata pool
> * cephfs_cache is the cache tier (3 OSDs on NVMes) for cephfs_data. The
> cache-mode is writeback.
>
> Everything had worked fine, until today when we tried to copy a 1.3TB file
> to the CephFS.  We got the "No space left on device" error!
>
> 'ceph -s' says some OSDs are full:
> # ceph -s
>   cluster:
> id: e18516bf-39cb-4670-9f13-88ccb7d19769
> health: HEALTH_ERR
> full flag(s) set
> 1 full osd(s)
> 1 pools have many more objects per pg than average
>
>   services:
> mon: 3 daemons, quorum pulpo-admin,pulpo-mon01,pulpo-mds01
> mgr: pulpo-mds01(active), standbys: pulpo-admin, pulpo-mon01
> mds: pulpos-1/1/1 up  {0=pulpo-mds01=up:active}
> osd: 39 osds: 39 up, 39 in
>  flags full
>
>   data:
> pools:   3 pools, 2176 pgs
> objects: 347k objects, 1381 GB
> usage:   2847 GB used, 262 TB / 265 TB avail
> pgs: 2176 active+clean
>
>   io:
> client:   19301 kB/s rd, 2935 op/s rd, 0 op/s wr
>
> And indeed the cache pool is full:
> # rados df
> POOL_NAME   USED  OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND
> DEGRADED RD_OPS   RD
> WR_OPS  WR
> cephfs_cache1381G  355385  0 710770  0   0
> 0 10004954 15
> 22G 1398063  1611G
> cephfs_data 0   0  0  0  0   0
> 00
>   0   0  0
> cephfs_metadata 8515k  24  0 72  0   0
> 03  3
> 0723953 10541k
>
> total_objects355409
> total_used   2847G
> total_avail  262T
> total_space  265T
>
> However, the data pool is completely empty! So it seems that data has only
> been written to the cache pool, but not written back to the data pool.
>
> I am really at a loss whether this is due to a setup error on my part, or
> a Luminous bug. Could anyone shed some light on this? Please let me know if
> you need any further info.
>
> Best,
> Shaw
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 1 osd Segmentation fault in test cluster

2017-10-05 Thread Sage Weil
On Sat, 30 Sep 2017, Marc Roos wrote:
> Is this useful for someone?

Yes!

>  1: (()+0xa29511) [0x7f762e5b2511]
>  2: (()+0xf370) [0x7f762afa5370]
>  3: (BlueStore::TwoQCache::_trim(unsigned long, unsigned long)+0x2df) 
> [0x7f762e481a2f]
>  4: (BlueStore::Cache::trim(unsigned long, float, float, float)+0x1d1) 
> [0x7f762e4543e1]
>  5: (BlueStore::MempoolThread::entry()+0x14d) [0x7f762e45a71d]

See http://tracker.ceph.com/issues/21259

The latest luminous branch (which you can get from 
https://shaman.ceph.com/builds/ceph/luminous/) has some additional 
debugging on OSD shutdown that should help me figure out what is causing 
this.  If this is something you can reproduce on your cluster, please 
install the latest luminous and set 'osd debug shutdown = true' in the 
[osd] section of your config, and then ceph-post-file the log after a 
crash.

Thanks!
sage


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD Mirror between two separate clusters named ceph

2017-10-05 Thread Alex Gorbachev
>
> On Thu, Oct 5, 2017 at 7:45 PM, Alex Gorbachev  
> wrote:
>> I am testing rbd mirroring, and have two existing clusters named ceph
>> in their ceph.conf.  Each cluster has a separate fsid.  On one
>> cluster, I renamed ceph.conf into remote-mirror.conf and
>> ceph.client.admin.keyring to remote-mirror.client.admin.keyring, but
>> it looks like this is not sufficient:
>>
>> root@lab2-mon3:/etc/ceph# rbd --cluster remote-mirror mirror pool peer
>> add spin2 client.admin@remote-mirror
>> rbd: error adding mirror peer
>> 2017-10-05 19:40:52.003289 7f290935c100 -1 librbd: Cannot add self as
>> remote peer
>>
>> Short of creating a whole new cluster, are there any options to make
>> such configuration work?

On Thu, Oct 5, 2017 at 8:13 PM, Jason Dillaman  wrote:
> The "cluster" name is really just the name of the configuration file.
> The only issue with your command-line is that you should connect to
> the "local" cluster to add a peer as a remote cluster:
>
> rbd --cluster ceph mirror pool peer add spin2 client.admin@remote-mirror

Thank you Jason, works perfectly now.  I used this link to get a bit
of context on local vs. remote
https://cloud.garr.it/support/kb/ceph/ceph-enabling-rbd-mirror/

Summary: It's OK to have both local and remote clusters named ceph,
just need to copy and rename the .conf and keyring files.

Best regards,
Alex
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD Mirror between two separate clusters named ceph

2017-10-05 Thread Jason Dillaman
The "cluster" name is really just the name of the configuration file.
The only issue with your command-line is that you should connect to
the "local" cluster to add a peer as a remote cluster:

rbd --cluster ceph mirror pool peer add spin2 client.admin@remote-mirror

On Thu, Oct 5, 2017 at 7:45 PM, Alex Gorbachev  wrote:
> I am testing rbd mirroring, and have two existing clusters named ceph
> in their ceph.conf.  Each cluster has a separate fsid.  On one
> cluster, I renamed ceph.conf into remote-mirror.conf and
> ceph.client.admin.keyring to remote-mirror.client.admin.keyring, but
> it looks like this is not sufficient:
>
> root@lab2-mon3:/etc/ceph# rbd --cluster remote-mirror mirror pool peer
> add spin2 client.admin@remote-mirror
> rbd: error adding mirror peer
> 2017-10-05 19:40:52.003289 7f290935c100 -1 librbd: Cannot add self as
> remote peer
>
> Short of creating a whole new cluster, are there any options to make
> such configuration work?
>
> Thank you,
> --
> Alex Gorbachev
> Storcium
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph cache pool full

2017-10-05 Thread Shawfeng Dong
Dear all,

We just set up a Ceph cluster, running the latest stable release Ceph
v12.2.0 (Luminous):
# ceph --version
ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)

The goal is to serve Ceph filesystem, for which we created 3 pools:
# ceph osd lspools
1 cephfs_data,2 cephfs_metadata,3 cephfs_cache,
where
* cephfs_data is the data pool (36 OSDs on HDDs), which is erased-coded;
* cephfs_metadata is the metadata pool
* cephfs_cache is the cache tier (3 OSDs on NVMes) for cephfs_data. The
cache-mode is writeback.

Everything had worked fine, until today when we tried to copy a 1.3TB file
to the CephFS.  We got the "No space left on device" error!

'ceph -s' says some OSDs are full:
# ceph -s
  cluster:
id: e18516bf-39cb-4670-9f13-88ccb7d19769
health: HEALTH_ERR
full flag(s) set
1 full osd(s)
1 pools have many more objects per pg than average

  services:
mon: 3 daemons, quorum pulpo-admin,pulpo-mon01,pulpo-mds01
mgr: pulpo-mds01(active), standbys: pulpo-admin, pulpo-mon01
mds: pulpos-1/1/1 up  {0=pulpo-mds01=up:active}
osd: 39 osds: 39 up, 39 in
 flags full

  data:
pools:   3 pools, 2176 pgs
objects: 347k objects, 1381 GB
usage:   2847 GB used, 262 TB / 265 TB avail
pgs: 2176 active+clean

  io:
client:   19301 kB/s rd, 2935 op/s rd, 0 op/s wr

And indeed the cache pool is full:
# rados df
POOL_NAME   USED  OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND
DEGRADED RD_OPS   RD
WR_OPS  WR
cephfs_cache1381G  355385  0 710770  0   0
  0 10004954 15
22G 1398063  1611G
cephfs_data 0   0  0  0  0   0
  00
  0   0  0
cephfs_metadata 8515k  24  0 72  0   0
  03  3
0723953 10541k

total_objects355409
total_used   2847G
total_avail  262T
total_space  265T

However, the data pool is completely empty! So it seems that data has only
been written to the cache pool, but not written back to the data pool.

I am really at a loss whether this is due to a setup error on my part, or a
Luminous bug. Could anyone shed some light on this? Please let me know if
you need any further info.

Best,
Shaw
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD Mirror between two separate clusters named ceph

2017-10-05 Thread Alex Gorbachev
I am testing rbd mirroring, and have two existing clusters named ceph
in their ceph.conf.  Each cluster has a separate fsid.  On one
cluster, I renamed ceph.conf into remote-mirror.conf and
ceph.client.admin.keyring to remote-mirror.client.admin.keyring, but
it looks like this is not sufficient:

root@lab2-mon3:/etc/ceph# rbd --cluster remote-mirror mirror pool peer
add spin2 client.admin@remote-mirror
rbd: error adding mirror peer
2017-10-05 19:40:52.003289 7f290935c100 -1 librbd: Cannot add self as
remote peer

Short of creating a whole new cluster, are there any options to make
such configuration work?

Thank you,
--
Alex Gorbachev
Storcium
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph stuck creating pool

2017-10-05 Thread Guilherme Lima
Hi,



Yes, I can confirm it has a networking problem not a Ceph problem.

After change the network on Cluster network virtual nics, everything start
working ok.



Thanks very much for the help.

Guilherme

*From:* Guilherme Lima [mailto:guilherme.l...@farfetch.com]
*Sent:* Tuesday, October 3, 2017 18:25
*To:* 'David Turner' ; 'Webert de Souza Lima' <
webert.b...@gmail.com>
*Cc:* 'ceph-users' 
*Subject:* RE: [ceph-users] Ceph stuck creating pool



Hi David;



Yes I can ping the host from the cluster network.

This is a test lab build in Hyper-V.

I think you are right, probably there is a problem with the cluster network.

I will check and let you know the results.



Thanks very much



Guilherme Lima

Systems Administrator



Main: +351 220 430 530

Fax: +351 253 424 739

Skype: guilherme.lima.farfetch.com



Farfetch

Rua da Lionesa, nr. 446

Edificio G12

4465-671 Leça do Balio

Porto – Portugal



[image:
http://cdn-static.farfetch.com/Content/UP/email_signature/fflogox.jpg]

400 Boutiques. 1 Address



http://farfetch.com

Twitter: https://twitter.com/farfetch

Facebook: https://www.facebook.com/Farfetch

Instagram: https://instagram.com/farfetch



This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you are not the named addressee/intended recipient then please delete it
and notify the sender immediately.



*From:* David Turner [mailto:drakonst...@gmail.com ]
*Sent:* Tuesday, October 3, 2017 17:53
*To:* Guilherme Lima ; Webert de Souza Lima <
webert.b...@gmail.com>
*Cc:* ceph-users 
*Subject:* Re: [ceph-users] Ceph stuck creating pool



My guess is a networking problem.  Do you have vlans, cluster network vs
public network in the ceph.conf, etc configured?  Can you ping between all
of your storage nodes on all of their IPs?



All of your OSDs communicate with the mons on the public network, but they
communicate with each other for peering on the cluster network.  My guess
is that your public network is working fine, but that your cluster network
might be having an issue causing the new PGs to never be able to peer.



On Tue, Oct 3, 2017 at 11:12 AM Guilherme Lima 
wrote:

Here it is,



size: 3

min_size: 2

crush_rule: replicated_rule



[

{

"rule_id": 0,

"rule_name": "replicated_rule",

"ruleset": 0,

"type": 1,

"min_size": 1,

"max_size": 10,

"steps": [

{

"op": "take",

"item": -1,

"item_name": "default"

},

{

"op": "chooseleaf_firstn",

"num": 0,

"type": "host"

},

{

"op": "emit"

}

]

}

]





Thanks

Guilherme





*From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of
*Webert de Souza Lima
*Sent:* Tuesday, October 3, 2017 15:47
*To:* ceph-users 
*Subject:* Re: [ceph-users] Ceph stuck creating pool



This looks like something wrong with the crush rule.



What's the size, min_size and crush_rule of this pool?

 ceph osd pool get POOLNAME size

 ceph osd pool get POOLNAME min_size

 ceph osd pool get POOLNAME crush_ruleset



How is the crush rule?

 ceph osd crush rule dump




Regards,



Webert Lima

DevOps Engineer at MAV Tecnologia

*Belo Horizonte - Brasil*



On Tue, Oct 3, 2017 at 11:22 AM, Guilherme Lima 
wrote:

Hi,



I have installed a virtual Ceph Cluster lab. I using Ceph Luminous v12.2.1

It consist in 3 mon + 3 osd nodes.

Each node have 3 x 250GB OSD.



My osd tree:



ID CLASS WEIGHT  TYPE NAME  STATUS REWEIGHT PRI-AFF

-1   2.19589 root default

-3   0.73196 host osd1

0   hdd 0.24399 osd.0  up  1.0 1.0

6   hdd 0.24399 osd.6  up  1.0 1.0

9   hdd 0.24399 osd.9  up  1.0 1.0

-5   0.73196 host osd2

1   hdd 0.24399 osd.1  up  1.0 1.0

7   hdd 0.24399 osd.7  up  1.0 1.0

10   hdd 0.24399 osd.10 up  1.0 1.0

-7   0.73196 host osd3

2   hdd 0.24399 osd.2  up  1.0 1.0

8   hdd 0.24399 osd.8  up  1.0 1.0

11   hdd 0.24399 osd.11 up  1.0 1.0



After create a new pool it is stuck on creating+peering and
creating+activating.



  cluster:

id: d20fdc12-f8bf-45c1-a276-c36dfcc788bc

health: HEALTH_WARN

Reduced data availability: 256 pgs inactive, 143 pgs peering

Degraded data redundancy: 256 pgs unclean



  services:

mon: 3 daemons, quorum mon2,mon3,mon1

mgr: mon1(active), standbys: mon2, mon3

osd: 9 osds: 9 up, 9 in




Re: [ceph-users] Ceph manager documentation missing from network config reference

2017-10-05 Thread John Spray
On Thu, Oct 5, 2017 at 9:30 PM, Stefan Kooman  wrote:
> Hi,
>
> While implementing (stricter) firewall rules I noticed weird behaviour.
> For the monitors only port 6789 was allowed. We currently co-locate the
> manager daemon with our monitors. Apparently (at least) port 6800 is
> also essential. In the Network Configuration Reference [1] there is no
> mention of the iptables rules needed for the manager.
> The figure depicting request / response within / between the client /
> nodes in the network does not yet describe interaction with manager.

This was an oversight in the docs (oops), I've just merged the PR that
updated the firewall page on the master branch here
(https://github.com/ceph/ceph/pull/17974).

> Do you need to open up port 6800(:7300?) completely, or is it enough to
> only allow traffic between manager(s) <-> monitor(s)?

The former: you need to open it up in general, because the OSDs and
other daemons will also need to report to the manager.

>
> Gr. Stefan
>
> P.s. How can one contribute to the documentation?

The docs are in the ceph git repo under doc/ -- you can clone the git
repository and work on them the same way as code, or for very simple
changes you can also use the github web UI to edit a file.  The
downside to the github UI is that once you've opened PR you can't then
update it, so I would only use it for tiny changes.

There is some more information here:
https://github.com/ceph/ceph/blob/master/doc/start/documenting-ceph.rst

Cheers,
John

>
> [1]: 
> http://docs.ceph.com/docs/luminous/rados/configuration/network-config-ref/
>
>
> --
> | BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
> | GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph manager documentation missing from network config reference

2017-10-05 Thread Stefan Kooman
Hi,

While implementing (stricter) firewall rules I noticed weird behaviour.
For the monitors only port 6789 was allowed. We currently co-locate the
manager daemon with our monitors. Apparently (at least) port 6800 is
also essential. In the Network Configuration Reference [1] there is no
mention of the iptables rules needed for the manager.
The figure depicting request / response within / between the client /
nodes in the network does not yet describe interaction with manager.
Do you need to open up port 6800(:7300?) completely, or is it enough to
only allow traffic between manager(s) <-> monitor(s)?

Gr. Stefan

P.s. How can one contribute to the documentation?

[1]: http://docs.ceph.com/docs/luminous/rados/configuration/network-config-ref/


-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bad crc/signature errors

2017-10-05 Thread Ilya Dryomov
On Thu, Oct 5, 2017 at 6:05 PM, Olivier Bonvalet  wrote:
> Le jeudi 05 octobre 2017 à 17:03 +0200, Ilya Dryomov a écrit :
>> When did you start seeing these errors?  Can you correlate that to
>> a ceph or kernel upgrade?  If not, and if you don't see other issues,
>> I'd write it off as faulty hardware.
>
> Well... I have one hypervisor (Xen 4.6 and kernel Linux 4.1.13), which

Is that 4.1.13 or 4.13.1?

> have the problem for a long time, at least since 1 month (I haven't
> older logs).
>
> But, on others hypervisors (Xen 4.8 with Linux 4.9.x), I haven't the
> problem.
> And it's when I upgraded thoses hypervisors to Linux 4.13.x, that "bad
> crc" errors appeared.
>
> Note : if I upgraded kernels on Xen 4.8 hypervisors, it's because some
> DISCARD commands over RBD were blocking ("fstrim" works, but not
> "lvremove" with discard enabled). After upgrading to Linux 4.13.3,
> DISCARD works again on Xen 4.8.

Which kernel did you upgrade from to 4.13.3 exactly?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph mirrors

2017-10-05 Thread Stefan Kooman
Hi,

Sorry for empty mail, that shouldn't have happened. I would like to
address the following. Currently the repository list for debian-
packages contain _only_ the latest package version. In case of a
(urgent) need to downgrade you cannot easily select an older version.
You then need to resort to download packages manually. I want to suggest
that we keep the older packages in the repo list. They are on the
mirrors anyway (../debian/pool/main/{c,r}/ceph/).

We have set up a Dutch Ceph mirror: http://ceph.download.bit.nl
(currently same server as Dutch ubuntu mirror). This mirror also listens
to "http://nl.ceph.com; and will do so for "https://nl.ceph.com; when
(if) it gets CNAMED (... with a let's encrypt cert).

Gr. Stefan

-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] _committed_osd_maps shutdown OSD via async signal, bug or feature?

2017-10-05 Thread Stefan Kooman
Quoting Gregory Farnum (gfar...@redhat.com):

> That's a feature, but invoking it may indicate the presence of another
> issue. The OSD shuts down if
> 1) it has been deleted from the cluster, or
> 2) it has been incorrectly marked down a bunch of times by the cluster, and
> gives up, or
> 3) it has been incorrectly marked down by the cluster, and encounters an
> error when it rebinds to new network ports
> 
> In your case, with the port flapping, OSDs are presumably getting marked
> down by their peers (since they can't communicate), and eventually give up
> on trying to stay alive. You can prevent/reduce that by setting
> the osd_max_markdown_count config to a very large number, if you really
> want to.

It's definitly the peers marking down the OSDs
(mon_osd_reporter_subtree_level = datacenter, mon_osd_min_down_reporters
= 2 <- 3 DC setup). You have to do pretty weird stuff to achieve this,
so we'll leave osd_max_markdown_count default. Good to know it's a
feature (in case such a rare condition might arise).

Thanks,

Stefan

-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] _committed_osd_maps shutdown OSD via async signal, bug or feature?

2017-10-05 Thread Gregory Farnum
On Thu, Oct 5, 2017 at 6:48 AM Stefan Kooman  wrote:

> Hi,
>
> During testing (mimicking BGP / port flaps) on our cluster we are able
> to trigger a "_committed_osd_maps shutdown OSD via async signal" on the
> the affected OSD servers in that datacenter (OSDs in that DC become
> intermittent isolated from their peers). Result is that all OSD
> processes stop. Is this a bug or a feature? I.e. is there a "flap"
> detection mechanism in Ceph OSD?
>
> If it's a bug it might be related to
> http://tracker.ceph.com/issues/20174. We get similiar error message on
> "12.2.0". Version "12.2.1" does not log
>
> "-1 Fail to open
> '/proc/0/cmdline' error = (2) No such file or directory
> -1 received  signal: Interrupt from  PID: 0 task name:  UID: 0
> -1 osd.21 1846 *** Got signal Interrupt ***
> 0 osd.21 1846 prepare_to_stop starting shutdown
> -1 osd.21 1846 shutdown"
>
>
That's a feature, but invoking it may indicate the presence of another
issue. The OSD shuts down if
1) it has been deleted from the cluster, or
2) it has been incorrectly marked down a bunch of times by the cluster, and
gives up, or
3) it has been incorrectly marked down by the cluster, and encounters an
error when it rebinds to new network ports

In your case, with the port flapping, OSDs are presumably getting marked
down by their peers (since they can't communicate), and eventually give up
on trying to stay alive. You can prevent/reduce that by setting
the osd_max_markdown_count config to a very large number, if you really
want to.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] erasure-coded with overwrites versus erasure-coded with cache tiering

2017-10-05 Thread Chad William Seys

Thanks David,
  When I convert to bluestore and the dust settles I hope to do a same 
cluster comparison and post here!


Chad.

On 09/30/2017 07:29 PM, David Turner wrote:

 > In my case, the replica-3 and k2m2 are stored on the same spinning disks.

That is exactly what I meant by same pool.  The only way for a cache to 
make sense would be if the data being written or read will be modified 
or heavily read for X amount of time and then ignored.


If things are rarely read, and randomly so, them prompting then into a 
cache tier just makes you wait for the object to be promoted to cache 
before you read it once or twice before it sits in there until it's 
demoted again.  If you have random io and anything can really be read 
next, then a cache tier on the same disks as the EC pool will only cause 
things to be promoted and demoted for no apparent reason.


You can always test this for your use case and see if it helps enough to 
create a pool and tier that you need to manage or not. I'm planning to 
remove my cephfs cache tier once I upgrade to Luminous as I only have it 
as a requirement. It causes me to show down my writes heavily as 
eviction io is useless and wasteful of cluster io for me.  I haven't 
checked on the process for that yet, but I'm assuming it's a set command 
on the pool that will then allow me to disable and remove the cache 
tier.  I mention that because if it is that easy to enable/disable, then 
testing it should be simple and easy to compare.



On Sat, Sep 30, 2017, 8:10 PM Chad William Seys > wrote:


Hi David,
    Thanks for the clarification.  Reminded me of some details I forgot
to mention.
    In my case, the replica-3 and k2m2 are stored on the same spinning
disks. (Mainly using EC for "compression" b/c with the EC k2m2 setting
PG only takes up the same amount of space as a replica-2 while allowing
2 disks to fail like replica-3 without loss.)
    I'm using this setup as RBDs and cephfs to store things like local
mirrors of linux packages and drive images to be broadcast over network.
   Seems to be about as fast as a normal hard drive. :)
    So is this the situation where the "cache tier [is] ont the same
root
of osds as the EC pool"?

Thanks for the advice!
Chad.

On 09/30/2017 12:32 PM, David Turner wrote:
 > I can only think of 1 type of cache tier usage that is faster if
you are
 > using the cache tier on the same root of osds as the EC pool. 
That is

 > cold storage where the file is written initially, modified and
read door
 > the first X hours, and then remains in cold storage for the
remainder of
 > its life with rate reads.
 >
 > Other than that there are a few use cases using a faster root of osds
 > that might make sense, but generally it's still better to utilize
that
 > faster storage in the rest of the osd stack either as journals for
 > filestore or Wal/DB partitions for bluestore.
 >
 >
 > On Sat, Sep 30, 2017, 12:56 PM Chad William Seys
 > 
>>
wrote:
 >
 >     Hi all,
 >         Now that Luminous supports direct writing to EC pools I was
 >     wondering
 >     if one can get more performance out of an erasure-coded pool with
 >     overwrites or an erasure-coded pool with a cache tier?
 >         I currently have a 3 replica pool in front of a k2m2
erasure coded
 >     pool.  Luminous documentation on cache tiering
 >

http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/#a-word-of-caution
 >     makes it sound like cache tiering is usually not recommonded.
 >
 >     Thanks!
 >     Chad.
 >     ___
 >     ceph-users mailing list
 > ceph-users@lists.ceph.com 
>
 > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 >


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Re : Re : bad crc/signature errors

2017-10-05 Thread Olivier Bonvalet
Le jeudi 05 octobre 2017 à 17:03 +0200, Ilya Dryomov a écrit :
> When did you start seeing these errors?  Can you correlate that to
> a ceph or kernel upgrade?  If not, and if you don't see other issues,
> I'd write it off as faulty hardware.

Well... I have one hypervisor (Xen 4.6 and kernel Linux 4.1.13), which
have the problem for a long time, at least since 1 month (I haven't
older logs).

But, on others hypervisors (Xen 4.8 with Linux 4.9.x), I haven't the
problem.
And it's when I upgraded thoses hypervisors to Linux 4.13.x, that "bad
crc" errors appeared.

Note : if I upgraded kernels on Xen 4.8 hypervisors, it's because some
DISCARD commands over RBD were blocking ("fstrim" works, but not
"lvremove" with discard enabled). After upgrading to Linux 4.13.3,
DISCARD works again on Xen 4.8.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bad crc/signature errors

2017-10-05 Thread Ilya Dryomov
On Thu, Oct 5, 2017 at 12:01 PM, Olivier Bonvalet  wrote:
> Le jeudi 05 octobre 2017 à 11:47 +0200, Ilya Dryomov a écrit :
>> The stable pages bug manifests as multiple sporadic connection
>> resets,
>> because in that case CRCs computed by the kernel don't always match
>> the
>> data that gets sent out.  When the mismatch is detected on the OSD
>> side, OSDs reset the connection and you'd see messages like
>>
>>   libceph: osd1 1.2.3.4:6800 socket closed (con state OPEN)
>>   libceph: osd2 1.2.3.4:6804 socket error on write
>>
>> This is a different issue.  Josy, Adrian, Olivier, do you also see
>> messages of the "libceph: read_partial_message ..." type or is it
>> just
>> "libceph: ... bad crc/signature" errors?
>
> I have "read_partial_message" too, for example :
>
> Oct  5 09:00:47 lorunde kernel: [65575.969322] libceph: read_partial_message 
> 88027c231500 data crc 181941039 != exp. 115232978
> Oct  5 09:00:47 lorunde kernel: [65575.969953] libceph: osd122 10.0.0.31:6800 
> bad crc/signature
> Oct  5 09:04:30 lorunde kernel: [65798.958344] libceph: read_partial_message 
> 880254a25c00 data crc 443114996 != exp. 2014723213
> Oct  5 09:04:30 lorunde kernel: [65798.959044] libceph: osd18 10.0.0.22:6802 
> bad crc/signature
> Oct  5 09:14:28 lorunde kernel: [66396.788272] libceph: read_partial_message 
> 880238636200 data crc 1797729588 != exp. 2550563968
> Oct  5 09:14:28 lorunde kernel: [66396.788984] libceph: osd43 10.0.0.9:6804 
> bad crc/signature
> Oct  5 10:09:36 lorunde kernel: [69704.211672] libceph: read_partial_message 
> 8802712dff00 data crc 2241944833 != exp. 762990605
> Oct  5 10:09:36 lorunde kernel: [69704.212422] libceph: osd103 10.0.0.28:6804 
> bad crc/signature
> Oct  5 10:25:41 lorunde kernel: [70669.203596] libceph: read_partial_message 
> 880257521400 data crc 3655331946 != exp. 2796991675
> Oct  5 10:25:41 lorunde kernel: [70669.204462] libceph: osd16 10.0.0.21:6806 
> bad crc/signature
> Oct  5 10:25:52 lorunde kernel: [70680.255943] libceph: read_partial_message 
> 880245e3d600 data crc 3787567693 != exp. 725251636
> Oct  5 10:25:52 lorunde kernel: [70680.257066] libceph: osd60 10.0.0.23:6800 
> bad crc/signature

OK, so both your and Josy's cases are actually the reverse: the kernel
detects the mismatch, so it's definitely not stable pages related.

When did you start seeing these errors?  Can you correlate that to
a ceph or kernel upgrade?  If not, and if you don't see other issues,
I'd write it off as faulty hardware.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph mirrors

2017-10-05 Thread Stefan Kooman

-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] TLS for tracker.ceph.com

2017-10-05 Thread Wido den Hollander

> Op 5 oktober 2017 om 15:57 schreef Stefan Kooman :
> 
> 
> Hi,
> 
> Can we supply http://tracker.ceph.com with TLS and make it
> https://tracker.ceph.com? Should be trivial with Let's Encrypt for
> example.
> 
> Thanks!

Yes please! This is something which has been asked for a couple of times. I 
really think this is needed.

Wido

> 
> Gr. Stefan
> 
> -- 
> | BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
> | GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bad crc/signature errors

2017-10-05 Thread Josy

Hi Ilya,

Yes, there are these messages of the "libceph: read_partial_message

 =


[715907.891171] libceph: read_partial_message 88033755a300 data crc 
4149769120 != exp. 2349968434
[715907.892163] libceph: read_partial_message 88033755b000 data crc 
2455195536 != exp. 2750456034

[715907.892167] libceph: osd17 10.255.0.9:6800 bad crc/signature
[715907.893807] libceph: osd16 10.255.0.8:6816 bad crc/signature
[715907.896219] libceph: read_partial_message 8803d8484400 data crc 
455708272 != exp. 1414757638

[715907.897442] libceph: osd27 10.255.0.11:6820 bad crc/signature
[715938.129539] xen-blkback: backend/vbd/3952/768: prepare for reconnect
[715938.470670] libceph: read_partial_message 88030fb89600 data crc 
1569919842 != exp. 3397794567
[715938.470711] libceph: read_partial_message 88017ffeb300 data crc 
3909314762 != exp. 2254973565

[715938.470715] libceph: osd5 10.255.0.6:6812 bad crc/signature
[715938.471898] libceph: osd25 10.255.0.11:6800 bad crc/signature
[715938.473788] libceph: read_partial_message 88017ffeb300 data crc 
682925087 != exp. 2254973565
[715938.474214] libceph: read_partial_message 88030fb89600 data crc 
3941482587 != exp. 3397794567

[715938.474217] libceph: osd25 10.255.0.11:6800 bad crc/signature
[715938.475026] libceph: osd5 10.255.0.6:6812 bad crc/signature


On 05-10-2017 15:17, Ilya Dryomov wrote:

On Thu, Oct 5, 2017 at 7:53 AM, Adrian Saul
 wrote:

We see the same messages and are similarly on a 4.4 KRBD version that is 
affected by this.

I have seen no impact from it so far that I know about



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Jason Dillaman
Sent: Thursday, 5 October 2017 5:45 AM
To: Gregory Farnum 
Cc: ceph-users ; Josy

Subject: Re: [ceph-users] bad crc/signature errors

Perhaps this is related to a known issue on some 4.4 and later kernels [1]
where the stable write flag was not preserved by the kernel?

[1] http://tracker.ceph.com/issues/19275

The stable pages bug manifests as multiple sporadic connection resets,
because in that case CRCs computed by the kernel don't always match the
data that gets sent out.  When the mismatch is detected on the OSD
side, OSDs reset the connection and you'd see messages like

   libceph: osd1 1.2.3.4:6800 socket closed (con state OPEN)
   libceph: osd2 1.2.3.4:6804 socket error on write

This is a different issue.  Josy, Adrian, Olivier, do you also see
messages of the "libceph: read_partial_message ..." type or is it just
"libceph: ... bad crc/signature" errors?

Thanks,

 Ilya



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] TLS for tracker.ceph.com

2017-10-05 Thread Stefan Kooman
Hi,

Can we supply http://tracker.ceph.com with TLS and make it
https://tracker.ceph.com? Should be trivial with Let's Encrypt for
example.

Thanks!

Gr. Stefan

-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] _committed_osd_maps shutdown OSD via async signal, bug or feature?

2017-10-05 Thread Stefan Kooman
Hi,

During testing (mimicking BGP / port flaps) on our cluster we are able
to trigger a "_committed_osd_maps shutdown OSD via async signal" on the
the affected OSD servers in that datacenter (OSDs in that DC become
intermittent isolated from their peers). Result is that all OSD
processes stop. Is this a bug or a feature? I.e. is there a "flap"
detection mechanism in Ceph OSD? 

If it's a bug it might be related to
http://tracker.ceph.com/issues/20174. We get similiar error message on
"12.2.0". Version "12.2.1" does not log 

"-1 Fail to open
'/proc/0/cmdline' error = (2) No such file or directory
-1 received  signal: Interrupt from  PID: 0 task name:  UID: 0
-1 osd.21 1846 *** Got signal Interrupt ***
0 osd.21 1846 prepare_to_stop starting shutdown
-1 osd.21 1846 shutdown"

Gr. Stefan


-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph monitoring

2017-10-05 Thread Lenz Grimmer
On 10/05/2017 12:15 PM, Jasper Spaans wrote:

> Thanks for the pointers - I guess I'll need to find some time to change
> those dashboards to use the ceph-mgr metrics names (at least, I'm unsure
> if the DO exporter uses the same names as ceph-mgr.) To be continued..

Not sure about that; AFAIK the Prometheus exporter based on ceph-mgr was
merged into the master branch already.

Lenz



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-mgr summarize recovery counters

2017-10-05 Thread John Spray
On Wed, Oct 4, 2017 at 7:14 PM, Gregory Farnum  wrote:
> On Wed, Oct 4, 2017 at 9:14 AM, Benjeman Meekhof  wrote:
>> Wondering if anyone can tell me how to summarize recovery
>> bytes/ops/objects from counters available in the ceph-mgr python
>> interface?  To put another way, how does the ceph -s command put
>> together that infomation and can I access that information from a
>> counter queryable by the ceph-mgr python module api?
>>
>> I want info like the 'recovery' part of the status output.  I have a
>> ceph-mgr module that feeds influxdb but I'm not sure what counters
>> from ceph-mgr to summarize to create this information.  OSD have
>> available a recovery_ops counter which is not quite the same.  Maybe
>> the various 'subop_..' counters encompass recovery ops?  It's not
>> clear to me but I'm hoping it is obvious to someone more familiar with
>> the internals.
>>
>> io:
>> client:   2034 B/s wr, 0 op/s rd, 0 op/s wr
>> recovery: 1173 MB/s, 8 keys/s, 682 objects/s
>
>
> You'll need to run queries against the PGMap. I'm not sure how that
> works in the python interfaces but I'm led to believe it's possible.
> Documentation is probably all in the PGMap.h header; you can look at
> functions like the "recovery_rate_summary" to see what they're doing.

Try get("pg_status") from a python module, that should contain the
recovery/client IO amongst other things.

You may find that the fields only appear when they're nonzero, I would
be happy to see a change that fixed the underlying functions to always
output the fields (e.g. in PGMapDigest::recovery_rate_summary) when
writing to a Formatter.  Skipping the irrelevant stuff is only useful
when doing plain text output.

John

> -Greg
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph monitoring

2017-10-05 Thread Jasper Spaans
On 05/10/2017 12:03, Lenz Grimmer wrote:

>> Now to find or build a pretty dashboard with all of these metrics. I
>> wasn't able to find something in the grafana supplied dashboards, and
>> haven't spent enough time on openattic to extract a dashboard from
>> there. Any pointers appreciated!
> openATTIC simply embeds Grafana dashboards, which are set up by DeepSea,
> which also takes care of the initial cluster deploy, including the
> required Prometheus node and exporters (we use the DigitalOcean Ceph
> exporter):
>
> https://github.com/SUSE/DeepSea
> https://github.com/digitalocean/ceph_exporter
>
> The Grafana dashboard files can be found here:
>
> https://github.com/SUSE/DeepSea/tree/master/srv/salt/ceph/monitoring/grafana/files
>  Lenz
Thanks for the pointers - I guess I'll need to find some time to change
those dashboards to use the ceph-mgr metrics names (at least, I'm unsure
if the DO exporter uses the same names as ceph-mgr.) To be continued..

Jasper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Re : Re : bad crc/signature errors

2017-10-05 Thread Olivier Bonvalet
Le jeudi 05 octobre 2017 à 11:10 +0200, Ilya Dryomov a écrit :
> On Thu, Oct 5, 2017 at 9:03 AM, Olivier Bonvalet  > wrote:
> > I also see that, but on 4.9.52 and 4.13.3 kernel.
> > 
> > I also have some kernel panic, but don't know if it's related (RBD
> > are
> > mapped on Xen hosts).
> 
> Do you have that panic message?
> 
> Do you use rbd devices for something other than Xen?  If so, have you
> ever seen these errors outside of Xen?
> 
> Thanks,
> 
> Ilya
> 

No, I don't have that panic message : the hosts reboots way too
quickly. And no, I only use this cluster with Xen.

Sorry for this useless answer...

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph monitoring

2017-10-05 Thread Lenz Grimmer
Hi,

On 10/03/2017 08:37 AM, Jasper Spaans wrote:

> Now to find or build a pretty dashboard with all of these metrics. I
> wasn't able to find something in the grafana supplied dashboards, and
> haven't spent enough time on openattic to extract a dashboard from
> there. Any pointers appreciated!

openATTIC simply embeds Grafana dashboards, which are set up by DeepSea,
which also takes care of the initial cluster deploy, including the
required Prometheus node and exporters (we use the DigitalOcean Ceph
exporter):

https://github.com/SUSE/DeepSea
https://github.com/digitalocean/ceph_exporter

The Grafana dashboard files can be found here:

https://github.com/SUSE/DeepSea/tree/master/srv/salt/ceph/monitoring/grafana/files
 Lenz



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Re : bad crc/signature errors

2017-10-05 Thread Olivier Bonvalet
Le jeudi 05 octobre 2017 à 11:47 +0200, Ilya Dryomov a écrit :
> The stable pages bug manifests as multiple sporadic connection
> resets,
> because in that case CRCs computed by the kernel don't always match
> the
> data that gets sent out.  When the mismatch is detected on the OSD
> side, OSDs reset the connection and you'd see messages like
> 
>   libceph: osd1 1.2.3.4:6800 socket closed (con state OPEN)
>   libceph: osd2 1.2.3.4:6804 socket error on write
> 
> This is a different issue.  Josy, Adrian, Olivier, do you also see
> messages of the "libceph: read_partial_message ..." type or is it
> just
> "libceph: ... bad crc/signature" errors?
> 
> Thanks,
> 
> Ilya

I have "read_partial_message" too, for example :

Oct  5 09:00:47 lorunde kernel: [65575.969322] libceph: read_partial_message 
88027c231500 data crc 181941039 != exp. 115232978
Oct  5 09:00:47 lorunde kernel: [65575.969953] libceph: osd122 10.0.0.31:6800 
bad crc/signature
Oct  5 09:04:30 lorunde kernel: [65798.958344] libceph: read_partial_message 
880254a25c00 data crc 443114996 != exp. 2014723213
Oct  5 09:04:30 lorunde kernel: [65798.959044] libceph: osd18 10.0.0.22:6802 
bad crc/signature
Oct  5 09:14:28 lorunde kernel: [66396.788272] libceph: read_partial_message 
880238636200 data crc 1797729588 != exp. 2550563968
Oct  5 09:14:28 lorunde kernel: [66396.788984] libceph: osd43 10.0.0.9:6804 bad 
crc/signature
Oct  5 10:09:36 lorunde kernel: [69704.211672] libceph: read_partial_message 
8802712dff00 data crc 2241944833 != exp. 762990605
Oct  5 10:09:36 lorunde kernel: [69704.212422] libceph: osd103 10.0.0.28:6804 
bad crc/signature
Oct  5 10:25:41 lorunde kernel: [70669.203596] libceph: read_partial_message 
880257521400 data crc 3655331946 != exp. 2796991675
Oct  5 10:25:41 lorunde kernel: [70669.204462] libceph: osd16 10.0.0.21:6806 
bad crc/signature
Oct  5 10:25:52 lorunde kernel: [70680.255943] libceph: read_partial_message 
880245e3d600 data crc 3787567693 != exp. 725251636
Oct  5 10:25:52 lorunde kernel: [70680.257066] libceph: osd60 10.0.0.23:6800 
bad crc/signature


On OSD side, for osd122 for example, I don't see any "reset" in osd
log.


Thanks,

Olivier
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bad crc/signature errors

2017-10-05 Thread Ilya Dryomov
On Thu, Oct 5, 2017 at 7:53 AM, Adrian Saul
 wrote:
>
> We see the same messages and are similarly on a 4.4 KRBD version that is 
> affected by this.
>
> I have seen no impact from it so far that I know about
>
>
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Jason Dillaman
>> Sent: Thursday, 5 October 2017 5:45 AM
>> To: Gregory Farnum 
>> Cc: ceph-users ; Josy
>> 
>> Subject: Re: [ceph-users] bad crc/signature errors
>>
>> Perhaps this is related to a known issue on some 4.4 and later kernels [1]
>> where the stable write flag was not preserved by the kernel?
>>
>> [1] http://tracker.ceph.com/issues/19275

The stable pages bug manifests as multiple sporadic connection resets,
because in that case CRCs computed by the kernel don't always match the
data that gets sent out.  When the mismatch is detected on the OSD
side, OSDs reset the connection and you'd see messages like

  libceph: osd1 1.2.3.4:6800 socket closed (con state OPEN)
  libceph: osd2 1.2.3.4:6804 socket error on write

This is a different issue.  Josy, Adrian, Olivier, do you also see
messages of the "libceph: read_partial_message ..." type or is it just
"libceph: ... bad crc/signature" errors?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bad crc/signature errors

2017-10-05 Thread Ilya Dryomov
On Thu, Oct 5, 2017 at 9:03 AM, Olivier Bonvalet  wrote:
> I also see that, but on 4.9.52 and 4.13.3 kernel.
>
> I also have some kernel panic, but don't know if it's related (RBD are
> mapped on Xen hosts).

Do you have that panic message?

Do you use rbd devices for something other than Xen?  If so, have you
ever seen these errors outside of Xen?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Xen & Ceph bad crc

2017-10-05 Thread Ilya Dryomov
On Thu, Oct 5, 2017 at 9:05 AM, Osama Hasebou  wrote:
> Hi Everyone,
>
> We are testing running Ceph  as a backend for Xen server acting as the
> client, and when a pool was created and mounted it as RBD in one of the
> client server, while adding data to it, we see this below error :
>
> 
> [939656.039750] libceph: osd20 10.255.0.9:6808 bad crc/signature
> [939656.041079] libceph: osd16 10.255.0.8:6816 bad crc/signature
> [939735.627456] libceph: osd11 10.255.0.7:6800 bad crc/signature
> [939735.628293] libceph: osd30 10.255.0.11:6804 bad crc/signature
>
>
> They are like 15 to 50 times in an hour when it happens,  There are some
> hours when it don't happens there are 564 such errors since yesterday.
>
>
> # uname -sr
> Linux 4.13.1-1.el7.elrepo.x86_64
>
> # ceph -v
> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
>
> # cat /etc/*-release
> CentOS Linux release 7.3.1611 (Core)
>
>
> Any ideas why is this happening ?

The stable pages issue mentioned by Jason shouldn't affect your kernel.
Just to make sure, what's the output of

  $ cat /sys/block/rbd0/bdi/stable_pages_required

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] tunable question

2017-10-05 Thread mj

Hi,

For the record, we changed tunables from "hammer" to "optimal", 
yesterday at 14:00, and it finished this morning at 9:00, so rebalancing 
 took 19 hours.


This was on a small ceph cluster, 24 4TB OSDs spread over three hosts, 
connected over 10G ethernet. Total amount of data: 32730 GB used, 56650 
GB / 89380 GB avail


We set noscrub and no-deepscrub during the rebalance, and our VMs 
experienced basically no impact.


MJ


On 10/03/2017 05:37 PM, lists wrote:

Thanks Jake, for your extensive reply. :-)

MJ

On 3-10-2017 15:21, Jake Young wrote:


On Tue, Oct 3, 2017 at 8:38 AM lists > wrote:


    Hi,

    What would make the decision easier: if we knew that we could easily
    revert the
  > "ceph osd crush tunables optimal"
    once it has begun rebalancing data?

    Meaning: if we notice that impact is too high, or it will take too 
long,

    that we could simply again say
  > "ceph osd crush tunables hammer"
    and the cluster would calm down again?


Yes you can revert the tunables back; but it will then move all the 
data back where it was, so be prepared for that.


Verify you have the following values in ceph.conf. Note that these are 
the defaults in Jewel, so if they aren’t defined, you’re probably good:

osd_max_backfills=1
osd_recovery_threads=1

You can try to set these (using ceph —inject) if you notice a large 
impact to your client performance:

osd_recovery_op_priority=1
osd_recovery_max_active=1
osd_recovery_threads=1

I recall this tunables change when we went from hammer to jewel last 
year. It took over 24 hours to rebalance 122TB on our 110 osd  cluster.


Jake



    MJ

    On 2-10-2017 9:41, Manuel Lausch wrote:
 > Hi,
 >
 > We have similar issues.
 > After upgradeing from hammer to jewel the tunable "choose leave
    stabel"
 > was introduces. If we activate it nearly all data will be 
moved. The

 > cluster has 2400 OSD on 40 nodes over two datacenters and is
    filled with
 > 2,5 PB Data.
 >
 > We tried to enable it but the backfillingtraffic is to high to be
 > handled without impacting other services on the Network.
 >
 > Do someone know if it is neccessary to enable this tunable? And 
could
 > it be a problem in the future if we want to upgrade to newer 
versions

 > wihout it enabled?
 >
 > Regards,
 > Manuel Lausch
 >
    ___
    ceph-users mailing list
    ceph-users@lists.ceph.com 
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Xen & Ceph bad crc

2017-10-05 Thread Osama Hasebou
Hi Everyone, 

We are testing running Ceph as a backend for Xen server acting as the client, 
and when a pool was created and mounted it as RBD in one of the client server, 
while adding data to it, we see this below error : 

 
[939656.039750] libceph: osd20 10.255.0.9:6808 bad crc/signature 
[939656.041079] libceph: osd16 10.255.0.8:6816 bad crc/signature 
[939735.627456] libceph: osd11 10.255.0.7:6800 bad crc/signature 
[939735.628293] libceph: osd30 10.255.0.11:6804 bad crc/signature 


They are like 15 to 50 times in an hour when it happens, There are some hours 
when it don't happens there are 564 such errors since yesterday. 


# uname -sr 
Linux 4.13.1-1.el7.elrepo.x86_64 

# ceph -v 
ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc) 

# cat /etc/*-release 
CentOS Linux release 7.3.1611 (Core) 


Any ideas why is this happening ? 

Thanks! 

Regards, 
Ossi 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Re : bad crc/signature errors

2017-10-05 Thread Olivier Bonvalet
I also see that, but on 4.9.52 and 4.13.3 kernel.

I also have some kernel panic, but don't know if it's related (RBD are
mapped on Xen hosts).

Le jeudi 05 octobre 2017 à 05:53 +, Adrian Saul a écrit :
> We see the same messages and are similarly on a 4.4 KRBD version that
> is affected by this.
> 
> I have seen no impact from it so far that I know about
> 
> 
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
> > Behalf Of
> > Jason Dillaman
> > Sent: Thursday, 5 October 2017 5:45 AM
> > To: Gregory Farnum 
> > Cc: ceph-users ; Josy
> > 
> > Subject: Re: [ceph-users] bad crc/signature errors
> > 
> > Perhaps this is related to a known issue on some 4.4 and later
> > kernels [1]
> > where the stable write flag was not preserved by the kernel?
> > 
> > [1] http://tracker.ceph.com/issues/19275
> > 
> > On Wed, Oct 4, 2017 at 2:36 PM, Gregory Farnum 
> > wrote:
> > > That message indicates that the checksums of messages between
> > > your
> > > kernel client and OSD are incorrect. It could be actual physical
> > > transmission errors, but if you don't see other issues then this
> > > isn't
> > > fatal; they can recover from it.
> > > 
> > > On Wed, Oct 4, 2017 at 8:52 AM Josy 
> > 
> > wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > We have setup a cluster with 8 OSD servers (31 disks)
> > > > 
> > > > Ceph health is Ok.
> > > > --
> > > > [root@las1-1-44 ~]# ceph -s
> > > >cluster:
> > > >  id: de296604-d85c-46ab-a3af-add3367f0e6d
> > > >  health: HEALTH_OK
> > > > 
> > > >services:
> > > >  mon: 3 daemons, quorum
> > > > ceph-las-mon-a1,ceph-las-mon-a2,ceph-las-mon-a3
> > > >  mgr: ceph-las-mon-a1(active), standbys: ceph-las-mon-a2
> > > >  osd: 31 osds: 31 up, 31 in
> > > > 
> > > >data:
> > > >  pools:   4 pools, 510 pgs
> > > >  objects: 459k objects, 1800 GB
> > > >  usage:   5288 GB used, 24461 GB / 29749 GB avail
> > > >  pgs: 510 active+clean
> > > > 
> > > > 
> > > > We created a pool and mounted it as RBD in one of the client
> > > > server.
> > > > While adding data to it, we see this below error :
> > > > 
> > > > 
> > > > [939656.039750] libceph: osd20 10.255.0.9:6808 bad
> > > > crc/signature
> > > > [939656.041079] libceph: osd16 10.255.0.8:6816 bad
> > > > crc/signature
> > > > [939735.627456] libceph: osd11 10.255.0.7:6800 bad
> > > > crc/signature
> > > > [939735.628293] libceph: osd30 10.255.0.11:6804 bad
> > > > crc/signature
> > > > 
> > > > =
> > > > 
> > > > Can anyone explain what is this and if I can fix it ?
> > > > 
> > > > 
> > > > ___
> > > > ceph-users mailing list
> > > > ceph-users@lists.ceph.com
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> > > 
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> > 
> > 
> > 
> > --
> > Jason
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> Confidentiality: This email and any attachments are confidential and
> may be subject to copyright, legal or some other professional
> privilege. They are intended solely for the attention and use of the
> named addressee(s). They may only be copied, distributed or disclosed
> with the consent of the copyright owner. If you have received this
> email by mistake or by breach of the confidentiality clause, please
> notify the sender immediately by return email and delete or destroy
> all copies of the email. Any confidentiality, privilege or copyright
> is not waived or lost because this email has been sent to you by
> mistake.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] inconsistent pg on erasure coded pool

2017-10-05 Thread Frédéric Nass

Hi Kenneth,

You should check for drive or XFS related errors in /var/log/message 
files on all nodes. We've had a similar issue in the past with a bad 
block on a hard drive.

We've had to :

1. Stop the OSD associated to the drive that had a bad block, flush its 
journal (ceph-osd -i $osd --flush-journal) and umount the filesystem,

2. Clear the bad blocks in the RAID/PERC Controller,
3. xfs_repair the partition, and partprobe the drive to start the OSD again,
4. ceph pg repair 

Regards,

Frédéric.

Le 04/10/2017 à 14:02, Kenneth Waegeman a écrit :

Hi,

We have some inconsistency / scrub error on a Erasure coded pool, that 
I can't seem to solve.


[root@osd008 ~]# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
pg 5.144 is active+clean+inconsistent, acting 
[81,119,148,115,142,100,25,63,48,11,43]

1 scrub errors

In the log files, it seems there is 1 missing shard:

/var/log/ceph/ceph-osd.81.log.2.gz:2017-10-02 23:49:11.940624 
7f0a9d7e2700 -1 log_channel(cluster) log [ERR] : 5.144s0 shard 63(7) 
missing 5:2297a2e1:::10014e2d8d5.:head
/var/log/ceph/ceph-osd.81.log.2.gz:2017-10-03 00:48:06.681941 
7f0a9d7e2700 -1 log_channel(cluster) log [ERR] : 5.144s0 deep-scrub 1 
missing, 0 inconsistent objects
/var/log/ceph/ceph-osd.81.log.2.gz:2017-10-03 00:48:06.681947 
7f0a9d7e2700 -1 log_channel(cluster) log [ERR] : 5.144 deep-scrub 1 
errors


I tried running ceph pg repair on the pg, but nothing changed. I also 
tried starting a new deep-scrub on the  osd 81 (ceph osd deep-scrub 
81) but I don't see any deep-scrub starting at the osd.


How can we solve this ?

Thank you!


Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com