Re: [ceph-users] hanging slow requests: failed to authpin, subtree is being exported

2019-10-21 Thread Kenneth Waegeman


I've made a ticket for this issue: https://tracker.ceph.com/issues/42338

Thanks again!

K

On 15/10/2019 18:00, Kenneth Waegeman wrote:

Hi Robert, all,


On 23/09/2019 17:37, Robert LeBlanc wrote:

On Mon, Sep 23, 2019 at 4:14 AM Kenneth Waegeman
 wrote:

Hi all,

When syncing data with rsync, I'm often getting blocked slow requests,
which also block access to this path.


2019-09-23 11:25:49.477 7f4f401e8700 0 log_channel(cluster) log [WRN]
: slow request 31.895478 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being 
exported

2019-09-23 11:26:19.477 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 61.896079 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being 
exported

2019-09-23 11:27:19.478 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 121.897268 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being 
exported

2019-09-23 11:29:19.488 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 241.899467 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being 
exported

2019-09-23 11:33:19.680 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 482.087927 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being 
exported

2019-09-23 11:36:09.881 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 32.677511 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being 
exported

2019-09-23 11:36:39.881 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 62.678132 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being 
exported

2019-09-23 11:37:39.891 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 122.679273 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being 
exported

2019-09-23 11:39:39.892 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 242.684667 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being 
exported

2019-09-23 11:41:19.893 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 962.305681 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being 
exported

2019-09-23 11:43:39.923 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 482.712888 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being 
exported

2019-09-23 11:51:40.236 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 963.037049 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being 
exported

2019-09-23 11:57:20.308 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 1922.719287 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being 
exported

2019-09-23 12:07:40.621 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 1923.409501 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0

Re: [ceph-users] hanging slow requests: failed to authpin, subtree is being exported

2019-10-15 Thread Kenneth Waegeman

Hi Robert, all,


On 23/09/2019 17:37, Robert LeBlanc wrote:

On Mon, Sep 23, 2019 at 4:14 AM Kenneth Waegeman
 wrote:

Hi all,

When syncing data with rsync, I'm often getting blocked slow requests,
which also block access to this path.


2019-09-23 11:25:49.477 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 31.895478 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:26:19.477 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 61.896079 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:27:19.478 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 121.897268 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:29:19.488 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 241.899467 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:33:19.680 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 482.087927 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:36:09.881 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 32.677511 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:36:39.881 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 62.678132 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:37:39.891 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 122.679273 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:39:39.892 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 242.684667 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:41:19.893 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 962.305681 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:43:39.923 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 482.712888 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:51:40.236 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 963.037049 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:57:20.308 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 1922.719287 seconds old, received at 2019-09-23
11:25:17.598152: client_request(client.38352684:92684 lookup
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 12:07:40.621 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 1923.409501 seconds old, received at 2019-09-23
11:35:37.217113: client_request(client.38347357:111963 lookup
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0,
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 12:29:20.639 7f4f401e8700  0 log_channel(cluster) log [WRN]
: slow request 3843.057602 seconds old

Re: [ceph-users] mds fail ing to start 14.2.2

2019-10-15 Thread Kenneth Waegeman

Hi Zheng,

Thanks, that let me think I forgot to remove some 'temporary-key' for 
the inconsistency issue I've got. Once those were removed,the mds 
started again.


Thanks again!

Kenneth

On 12/10/2019 04:26, Yan, Zheng wrote:



On Sat, Oct 12, 2019 at 1:10 AM Kenneth Waegeman 
mailto:kenneth.waege...@ugent.be>> wrote:


Hi all,

After solving some pg inconsistency problems, my fs is still in
trouble.  my mds's are crashing with this error:


>     -5> 2019-10-11 19:02:55.375 7f2d39f10700  1 mds.1.564276
rejoin_start
>     -4> 2019-10-11 19:02:55.385 7f2d3d717700  5 mds.beacon.mds01
> received beacon reply up:rejoin seq 5 rtt 1.01
>     -3> 2019-10-11 19:02:55.495 7f2d39f10700  1 mds.1.564276
> rejoin_joint_start
>     -2> 2019-10-11 19:02:55.505 7f2d39f10700  5 mds.mds01
> handle_mds_map old map epoch 564279 <= 564279, discarding
>     -1> 2019-10-11 19:02:55.695 7f2d33f04700 -1
>

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/mdstyp
> es.h: In function 'static void
> dentry_key_t::decode_helper(std::string_view, std::string&,
> snapid_t&)' thread 7f2d33f04700 time 2019-10-11 19:02:55.703343
>

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/mdstypes.h:

> 1229: FAILED ceph_assert(i != string::npos
> )
>
>  ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be)
> nautilus (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x14a) [0x7f2d43393046]
>  2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char
> const*, char const*, ...)+0) [0x7f2d43393214]
>  3: (CDir::_omap_fetched(ceph::buffer::v14_2_0::list&,
> std::map std::less, std::allocator ceph::buffer::v14_2_0::list> > >&, bool, int)+0xa68) [0x556a17ec
> baa8]
>  4: (C_IO_Dir_OMAP_Fetched::finish(int)+0x54) [0x556a17ee0034]
>  5: (MDSContext::complete(int)+0x70) [0x556a17f5e710]
>  6: (MDSIOContextBase::complete(int)+0x16b) [0x556a17f5e9ab]
>  7: (Finisher::finisher_thread_entry()+0x156) [0x7f2d433d8386]
>  8: (()+0x7dd5) [0x7f2d41262dd5]
>  9: (clone()+0x6d) [0x7f2d3ff1302d]
>
>  0> 2019-10-11 19:02:55.695 7f2d33f04700 -1 *** Caught signal
> (Aborted) **
>  in thread 7f2d33f04700 thread_name:fn_anonymous
>
>  ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be)
> nautilus (stable)
>  1: (()+0xf5d0) [0x7f2d4126a5d0]
>  2: (gsignal()+0x37) [0x7f2d3fe4b2c7]
>  3: (abort()+0x148) [0x7f2d3fe4c9b8]
>  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x199) [0x7f2d43393095]
>  5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char
> const*, char const*, ...)+0) [0x7f2d43393214]
>  6: (CDir::_omap_fetched(ceph::buffer::v14_2_0::list&,
> std::map std::less, std::allocator ceph::buffer::v14_2_0::list> > >&, bool, int)+0xa68) [0x556a17ec
> baa8]
>  7: (C_IO_Dir_OMAP_Fetched::finish(int)+0x54) [0x556a17ee0034]
>  8: (MDSContext::complete(int)+0x70) [0x556a17f5e710]
>  9: (MDSIOContextBase::complete(int)+0x16b) [0x556a17f5e9ab]
>  10: (Finisher::finisher_thread_entry()+0x156) [0x7f2d433d8386]
>  11: (()+0x7dd5) [0x7f2d41262dd5]
>  12: (clone()+0x6d) [0x7f2d3ff1302d]
>  NOTE: a copy of the executable, or `objdump -rdS ` is
> needed to interpret this.
>
> [root@mds02 ~]# ceph -s
>   cluster:
>     id: 92bfcf0a-1d39-43b3-b60f-44f01b630e47
>     health: HEALTH_WARN
>     1 filesystem is degraded
>     insufficient standby MDS daemons available
>     1 MDSs behind on trimming
>     1 large omap objects
>
>   services:
>     mon: 3 daemons, quorum mds01,mds02,mds03 (age 4d)
>     mgr: mds02(active, since 3w), standbys: mds01, mds03
>     mds: ceph_fs:2/2 {0=mds02=up:rejoin,1=mds01=up:rejoin(laggy or
> crashed)}
>     osd: 535 osds: 533 up, 529 in
>
>   data:
>     pools:   3 pools, 3328 pgs
>     objects: 376.32M objects, 673 TiB
>     usage:   1.0 PiB used, 2.2 PiB / 3.2 PiB avail
>     pgs: 3315 active+clean
>  12   active+clean+scrubbing+deep
>  1    active+clean+scrubbing
>
Someone an idea where to go from here ?☺


looks like omap for dirfrag is corrupted.  ple

[ceph-users] mds fail ing to start 14.2.2

2019-10-11 Thread Kenneth Waegeman

Hi all,

After solving some pg inconsistency problems, my fs is still in 
trouble.  my mds's are crashing with this error:




    -5> 2019-10-11 19:02:55.375 7f2d39f10700  1 mds.1.564276 rejoin_start
    -4> 2019-10-11 19:02:55.385 7f2d3d717700  5 mds.beacon.mds01 
received beacon reply up:rejoin seq 5 rtt 1.01
    -3> 2019-10-11 19:02:55.495 7f2d39f10700  1 mds.1.564276 
rejoin_joint_start
    -2> 2019-10-11 19:02:55.505 7f2d39f10700  5 mds.mds01 
handle_mds_map old map epoch 564279 <= 564279, discarding
    -1> 2019-10-11 19:02:55.695 7f2d33f04700 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/mdstyp
es.h: In function 'static void 
dentry_key_t::decode_helper(std::string_view, std::string&, 
snapid_t&)' thread 7f2d33f04700 time 2019-10-11 19:02:55.703343
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/mdstypes.h: 
1229: FAILED ceph_assert(i != string::npos

)

 ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) 
nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x14a) [0x7f2d43393046]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char 
const*, char const*, ...)+0) [0x7f2d43393214]
 3: (CDir::_omap_fetched(ceph::buffer::v14_2_0::list&, 
std::mapstd::less, std::allocatorceph::buffer::v14_2_0::list> > >&, bool, int)+0xa68) [0x556a17ec

baa8]
 4: (C_IO_Dir_OMAP_Fetched::finish(int)+0x54) [0x556a17ee0034]
 5: (MDSContext::complete(int)+0x70) [0x556a17f5e710]
 6: (MDSIOContextBase::complete(int)+0x16b) [0x556a17f5e9ab]
 7: (Finisher::finisher_thread_entry()+0x156) [0x7f2d433d8386]
 8: (()+0x7dd5) [0x7f2d41262dd5]
 9: (clone()+0x6d) [0x7f2d3ff1302d]

 0> 2019-10-11 19:02:55.695 7f2d33f04700 -1 *** Caught signal 
(Aborted) **

 in thread 7f2d33f04700 thread_name:fn_anonymous

 ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) 
nautilus (stable)

 1: (()+0xf5d0) [0x7f2d4126a5d0]
 2: (gsignal()+0x37) [0x7f2d3fe4b2c7]
 3: (abort()+0x148) [0x7f2d3fe4c9b8]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x199) [0x7f2d43393095]
 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char 
const*, char const*, ...)+0) [0x7f2d43393214]
 6: (CDir::_omap_fetched(ceph::buffer::v14_2_0::list&, 
std::mapstd::less, std::allocatorceph::buffer::v14_2_0::list> > >&, bool, int)+0xa68) [0x556a17ec

baa8]
 7: (C_IO_Dir_OMAP_Fetched::finish(int)+0x54) [0x556a17ee0034]
 8: (MDSContext::complete(int)+0x70) [0x556a17f5e710]
 9: (MDSIOContextBase::complete(int)+0x16b) [0x556a17f5e9ab]
 10: (Finisher::finisher_thread_entry()+0x156) [0x7f2d433d8386]
 11: (()+0x7dd5) [0x7f2d41262dd5]
 12: (clone()+0x6d) [0x7f2d3ff1302d]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


[root@mds02 ~]# ceph -s
  cluster:
    id: 92bfcf0a-1d39-43b3-b60f-44f01b630e47
    health: HEALTH_WARN
    1 filesystem is degraded
    insufficient standby MDS daemons available
    1 MDSs behind on trimming
    1 large omap objects

  services:
    mon: 3 daemons, quorum mds01,mds02,mds03 (age 4d)
    mgr: mds02(active, since 3w), standbys: mds01, mds03
    mds: ceph_fs:2/2 {0=mds02=up:rejoin,1=mds01=up:rejoin(laggy or 
crashed)}

    osd: 535 osds: 533 up, 529 in

  data:
    pools:   3 pools, 3328 pgs
    objects: 376.32M objects, 673 TiB
    usage:   1.0 PiB used, 2.2 PiB / 3.2 PiB avail
    pgs: 3315 active+clean
 12   active+clean+scrubbing+deep
 1    active+clean+scrubbing


Someone an idea where to go from here ?☺

Thanks!

K

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] lot of inconsistent+failed_repair - failed to pick suitable auth object (14.2.3)

2019-10-11 Thread Kenneth Waegeman



On 11/10/2019 01:21, Brad Hubbard wrote:

On Fri, Oct 11, 2019 at 12:27 AM Kenneth Waegeman
 wrote:

Hi Brad, all,

Pool 6 has min_size 2:

pool 6 'metadata' replicated size 3 min_size 2 crush_rule 1 object_hash
rjenkins pg_num 1024 pgp_num 1024 autoscale_mode warn last_change 172476
flags hashpspool stripe_width 0 application cephfs

This looked like something min_size 1 could cause, but I guess that's
not the cause here.


so inconsistens is empty, which is weird, no ?

Try scrubbing the pg just before running the command.


Ah that worked! I could then do the trick with the temporary_key to 
solve the inconsistent errors.


Thanks!!

K




Thanks again!

K


On 10/10/2019 12:52, Brad Hubbard wrote:

Does pool 6 have min_size = 1 set?

https://tracker.ceph.com/issues/24994#note-5 would possibly be helpful
here, depending on what the output of the following command looks
like.

# rados list-inconsistent-obj [pgid] --format=json-pretty

On Thu, Oct 10, 2019 at 8:16 PM Kenneth Waegeman
 wrote:

Hi all,

After some node failure and rebalancing, we have a lot of pg's in
inconsistent state. I tried to repair, but it din't work. This is also
in the logs:


2019-10-10 11:23:27.221 7ff54c9b0700  0 log_channel(cluster) log [DBG]
: 6.327 repair starts
2019-10-10 11:23:27.431 7ff5509b8700 -1 log_channel(cluster) log [ERR]
: 6.327 shard 19 soid 6:e4c130fd:::20005f3b582.:head :
omap_digest 0x334f57be != omap_digest 0xa8c4ce76 from auth oi
6:e4c130fd:::20005f3b582.:head(203789'1033530 osd.3.0:342
dirty|omap|data_digest|omap_digest s 0 uv 1032164 dd  od
a8c4ce76 alloc_hint [0 0 0])
2019-10-10 11:23:27.431 7ff5509b8700 -1 log_channel(cluster) log [ERR]
: 6.327 shard 72 soid 6:e4c130fd:::20005f3b582.:head :
omap_digest 0x334f57be != omap_digest 0xa8c4ce76 from auth oi
6:e4c130fd:::20005f3b582.:head(203789'1033530 osd.3.0:342
dirty|omap|data_digest|omap_digest s 0 uv 1032164 dd  od
a8c4ce76 alloc_hint [0 0 0])
2019-10-10 11:23:27.431 7ff5509b8700 -1 log_channel(cluster) log [ERR]
: 6.327 shard 91 soid 6:e4c130fd:::20005f3b582.:head :
omap_digest 0x334f57be != omap_digest 0xa8c4ce76 from auth oi
6:e4c130fd:::20005f3b582.:head(203789'1033530 osd.3.0:342
dirty|omap|data_digest|omap_digest s 0 uv 1032164 dd  od
a8c4ce76 alloc_hint [0 0 0])
2019-10-10 11:23:27.431 7ff5509b8700 -1 log_channel(cluster) log [ERR]
: 6.327 soid 6:e4c130fd:::20005f3b582.:head : failed to pick
suitable auth object
2019-10-10 11:23:27.731 7ff54c9b0700 -1 log_channel(cluster) log [ERR]
: 6.327 shard 19 soid 6:e4c2e57b:::20005f11daa.:head :
omap_digest 0x6aafaf97 != omap_digest 0x56dd55a2 from auth oi
6:e4c2e57b:::20005f11daa.:head(203789'1033711 osd.3.0:3666823
dirty|omap|data_digest|omap_digest s 0 uv 1032158 dd  od
56dd55a2 alloc_hint [0 0 0])
2019-10-10 11:23:27.731 7ff54c9b0700 -1 log_channel(cluster) log [ERR]
: 6.327 shard 72 soid 6:e4c2e57b:::20005f11daa.:head :
omap_digest 0x6aafaf97 != omap_digest 0x56dd55a2 from auth oi
6:e4c2e57b:::20005f11daa.:head(203789'1033711 osd.3.0:3666823
dirty|omap|data_digest|omap_digest s 0 uv 1032158 dd  od
56dd55a2 alloc_hint [0 0 0])
2019-10-10 11:23:27.731 7ff54c9b0700 -1 log_channel(cluster) log [ERR]
: 6.327 shard 91 soid 6:e4c2e57b:::20005f11daa.:head :
omap_digest 0x6aafaf97 != omap_digest 0x56dd55a2 from auth oi
6:e4c2e57b:::20005f11daa.:head(203789'1033711 osd.3.0:3666823
dirty|omap|data_digest|omap_digest s 0 uv 1032158 dd  od
56dd55a2 alloc_hint [0 0 0])
2019-10-10 11:23:27.731 7ff54c9b0700 -1 log_channel(cluster) log [ERR]
: 6.327 soid 6:e4c2e57b:::20005f11daa.:head : failed to pick
suitable auth object
2019-10-10 11:23:27.971 7ff54c9b0700 -1 log_channel(cluster) log [ERR]
: 6.327 shard 19 soid 6:e4c40009:::20005f45f1b.:head :
omap_digest 0x7ccf5cc9 != omap_digest 0xe048d29 from auth oi
6:e4c40009:::20005f45f1b.:head(203789'1033837 osd.3.0:3666949
dirty|omap|data_digest|omap_digest s 0 uv 1032168 dd  od
e048d29 alloc_hint [0 0 0])
2019-10-10 11:23:27.971 7ff54c9b0700 -1 log_channel(cluster) log [ERR]
: 6.327 shard 72 soid 6:e4c40009:::20005f45f1b.:head :
omap_digest 0x7ccf5cc9 != omap_digest 0xe048d29 from auth oi
6:e4c40009:::20005f45f1b.:head(203789'1033837 osd.3.0:3666949
dirty|omap|data_digest|omap_digest s 0 uv 1032168 dd  od
e048d29 alloc_hint [0 0 0])
2019-10-10 11:23:27.971 7ff54c9b0700 -1 log_channel(cluster) log [ERR]
: 6.327 shard 91 soid 6:e4c40009:::20005f45f1b.:head :
omap_digest 0x7ccf5cc9 != omap_digest 0xe048d29 from auth oi
6:e4c40009:::20005f45f1b.:head(203789'1033837 osd.3.0:3666949
dirty|omap|data_digest|omap_digest s 0 uv 1032168 dd  od
e048d29 alloc_hint [0 0 0])
2019-10-10 11:23:27.971 7ff54c9b0700 -1 log_channel(cluster) log [ERR]
: 6.327 soid 6:e4c40009:::20005f45f1b.:head : failed to pick
suitable auth object
2019-10-10 11

Re: [ceph-users] lot of inconsistent+failed_repair - failed to pick suitable auth object (14.2.3)

2019-10-10 Thread Kenneth Waegeman

Hi Brad, all,

Pool 6 has min_size 2:

pool 6 'metadata' replicated size 3 min_size 2 crush_rule 1 object_hash 
rjenkins pg_num 1024 pgp_num 1024 autoscale_mode warn last_change 172476 
flags hashpspool stripe_width 0 application cephfs


The output for all the inconsistent pgs is this:


{
    "epoch": 207682,
    "inconsistents": []
}
{
    "epoch": 207627,
    "inconsistents": []
}
{
    "epoch": 207700,
    "inconsistents": []
}
{
    "epoch": 207720,
    "inconsistents": []
}
{
    "epoch": 207657,
    "inconsistents": []
}
{
    "epoch": 207652,
    "inconsistents": []
}
{
    "epoch": 207750,
    "inconsistents": []
}
{
    "epoch": 208021,
    "inconsistents": []
}
{
    "epoch": 207726,
    "inconsistents": []
}
{
    "epoch": 207645,
    "inconsistents": []
}
{
    "epoch": 207347,
    "inconsistents": []
}
{
    "epoch": 207649,
    "inconsistents": []
}
{
    "epoch": 207727,
    "inconsistents": []
}
{
    "epoch": 207676,
    "inconsistents": []
}
{
    "epoch": 207373,
    "inconsistents": []
}
{
    "epoch": 207736,
    "inconsistents": []
}
{
    "epoch": 207641,
    "inconsistents": []
}
{
    "epoch": 207750,
    "inconsistents": []
}
{
    "epoch": 207573,
    "inconsistents": []
}
{
    "epoch": 207658,
    "inconsistents": []
}
{
    "epoch": 207616,
    "inconsistents": []
}
{
    "epoch": 207387,
    "inconsistents": []
}
{
    "epoch": 207991,
    "inconsistents": []
}
{
    "epoch": 207648,
    "inconsistents": []
}
{
    "epoch": 207614,
    "inconsistents": []
}
{
    "epoch": 207287,
    "inconsistents": []
}
{
    "epoch": 207663,
    "inconsistents": []
}
{
    "epoch": 207643,
    "inconsistents": []
}
{
    "epoch": 207701,
    "inconsistents": []
}
{
    "epoch": 207693,
    "inconsistents": []
}
{
    "epoch": 207632,
    "inconsistents": []
}
{
    "epoch": 207389,
    "inconsistents": []
}
{
    "epoch": 207692,
    "inconsistents": []
}
{
    "epoch": 207634,
    "inconsistents": []
}
{
    "epoch": 207309,
    "inconsistents": []
}
{
    "epoch": 207651,
    "inconsistents": []
}
{
    "epoch": 207643,
    "inconsistents": []
}
{
    "epoch": 207656,
    "inconsistents": []
}
{
    "epoch": 207729,
    "inconsistents": []
}
{
    "epoch": 207196,
    "inconsistents": []
}
{
    "epoch": 207626,
    "inconsistents": []
}
{
    "epoch": 207432,
    "inconsistents": []
}
{
    "epoch": 207652,
    "inconsistents": []
}
{
    "epoch": 207427,
    "inconsistents": []
}
{
    "epoch": 207676,
    "inconsistents": []
}
{
    "epoch": 207624,
    "inconsistents": []
}
{
    "epoch": 207658,
    "inconsistents": []
}
{
    "epoch": 207628,
    "inconsistents": []
}
{
    "epoch": 207546,
    "inconsistents": []
}
{
    "epoch": 207655,
    "inconsistents": []
}
{
    "epoch": 207602,
    "inconsistents": []
}

so inconsistens is empty, which is weird, no ?

Thanks again!

K


On 10/10/2019 12:52, Brad Hubbard wrote:

Does pool 6 have min_size = 1 set?

https://tracker.ceph.com/issues/24994#note-5 would possibly be helpful
here, depending on what the output of the following command looks
like.

# rados list-inconsistent-obj [pgid] --format=json-pretty

On Thu, Oct 10, 2019 at 8:16 PM Kenneth Waegeman
 wrote:

Hi all,

After some node failure and rebalancing, we have a lot of pg's in
inconsistent state. I tried to repair, but it din't work. This is also
in the logs:


2019-10-10 11:23:27.221 7ff54c9b0700  0 log_channel(cluster) log [DBG]
: 6.327 repair starts
2019-10-10 11:23:27.431 7ff5509b8700 -1 log_channel(cluster) log [ERR]
: 6.327 shard 19 soid 6:e4c130fd:::20005f3b582.:head :
omap_digest 0x334f57be != omap_digest 0xa8c4ce76 from auth oi
6:e4c130fd:::20005f3b582.:head(203789'1033530 osd.3.0:342
dirty|omap|data_digest|omap_digest s 0 uv 1032164 dd  od
a8c4ce76 alloc_hint [0 0 0])
2019-10-10 11:23:27.431 7ff5509b8700 -1 log_channel(cluster) log [ERR]
: 6.327 shard 72 soid 6:e4c130fd:::20005f3b582.:head :
omap_digest 0x334f57be != omap_

[ceph-users] lot of inconsistent+failed_repair - failed to pick suitable auth object (14.2.3)

2019-10-10 Thread Kenneth Waegeman

Hi all,

After some node failure and rebalancing, we have a lot of pg's in 
inconsistent state. I tried to repair, but it din't work. This is also 
in the logs:


2019-10-10 11:23:27.221 7ff54c9b0700  0 log_channel(cluster) log [DBG] 
: 6.327 repair starts
2019-10-10 11:23:27.431 7ff5509b8700 -1 log_channel(cluster) log [ERR] 
: 6.327 shard 19 soid 6:e4c130fd:::20005f3b582.:head : 
omap_digest 0x334f57be != omap_digest 0xa8c4ce76 from auth oi 
6:e4c130fd:::20005f3b582.:head(203789'1033530 osd.3.0:342 
dirty|omap|data_digest|omap_digest s 0 uv 1032164 dd  od 
a8c4ce76 alloc_hint [0 0 0])
2019-10-10 11:23:27.431 7ff5509b8700 -1 log_channel(cluster) log [ERR] 
: 6.327 shard 72 soid 6:e4c130fd:::20005f3b582.:head : 
omap_digest 0x334f57be != omap_digest 0xa8c4ce76 from auth oi 
6:e4c130fd:::20005f3b582.:head(203789'1033530 osd.3.0:342 
dirty|omap|data_digest|omap_digest s 0 uv 1032164 dd  od 
a8c4ce76 alloc_hint [0 0 0])
2019-10-10 11:23:27.431 7ff5509b8700 -1 log_channel(cluster) log [ERR] 
: 6.327 shard 91 soid 6:e4c130fd:::20005f3b582.:head : 
omap_digest 0x334f57be != omap_digest 0xa8c4ce76 from auth oi 
6:e4c130fd:::20005f3b582.:head(203789'1033530 osd.3.0:342 
dirty|omap|data_digest|omap_digest s 0 uv 1032164 dd  od 
a8c4ce76 alloc_hint [0 0 0])
2019-10-10 11:23:27.431 7ff5509b8700 -1 log_channel(cluster) log [ERR] 
: 6.327 soid 6:e4c130fd:::20005f3b582.:head : failed to pick 
suitable auth object
2019-10-10 11:23:27.731 7ff54c9b0700 -1 log_channel(cluster) log [ERR] 
: 6.327 shard 19 soid 6:e4c2e57b:::20005f11daa.:head : 
omap_digest 0x6aafaf97 != omap_digest 0x56dd55a2 from auth oi 
6:e4c2e57b:::20005f11daa.:head(203789'1033711 osd.3.0:3666823 
dirty|omap|data_digest|omap_digest s 0 uv 1032158 dd  od 
56dd55a2 alloc_hint [0 0 0])
2019-10-10 11:23:27.731 7ff54c9b0700 -1 log_channel(cluster) log [ERR] 
: 6.327 shard 72 soid 6:e4c2e57b:::20005f11daa.:head : 
omap_digest 0x6aafaf97 != omap_digest 0x56dd55a2 from auth oi 
6:e4c2e57b:::20005f11daa.:head(203789'1033711 osd.3.0:3666823 
dirty|omap|data_digest|omap_digest s 0 uv 1032158 dd  od 
56dd55a2 alloc_hint [0 0 0])
2019-10-10 11:23:27.731 7ff54c9b0700 -1 log_channel(cluster) log [ERR] 
: 6.327 shard 91 soid 6:e4c2e57b:::20005f11daa.:head : 
omap_digest 0x6aafaf97 != omap_digest 0x56dd55a2 from auth oi 
6:e4c2e57b:::20005f11daa.:head(203789'1033711 osd.3.0:3666823 
dirty|omap|data_digest|omap_digest s 0 uv 1032158 dd  od 
56dd55a2 alloc_hint [0 0 0])
2019-10-10 11:23:27.731 7ff54c9b0700 -1 log_channel(cluster) log [ERR] 
: 6.327 soid 6:e4c2e57b:::20005f11daa.:head : failed to pick 
suitable auth object
2019-10-10 11:23:27.971 7ff54c9b0700 -1 log_channel(cluster) log [ERR] 
: 6.327 shard 19 soid 6:e4c40009:::20005f45f1b.:head : 
omap_digest 0x7ccf5cc9 != omap_digest 0xe048d29 from auth oi 
6:e4c40009:::20005f45f1b.:head(203789'1033837 osd.3.0:3666949 
dirty|omap|data_digest|omap_digest s 0 uv 1032168 dd  od 
e048d29 alloc_hint [0 0 0])
2019-10-10 11:23:27.971 7ff54c9b0700 -1 log_channel(cluster) log [ERR] 
: 6.327 shard 72 soid 6:e4c40009:::20005f45f1b.:head : 
omap_digest 0x7ccf5cc9 != omap_digest 0xe048d29 from auth oi 
6:e4c40009:::20005f45f1b.:head(203789'1033837 osd.3.0:3666949 
dirty|omap|data_digest|omap_digest s 0 uv 1032168 dd  od 
e048d29 alloc_hint [0 0 0])
2019-10-10 11:23:27.971 7ff54c9b0700 -1 log_channel(cluster) log [ERR] 
: 6.327 shard 91 soid 6:e4c40009:::20005f45f1b.:head : 
omap_digest 0x7ccf5cc9 != omap_digest 0xe048d29 from auth oi 
6:e4c40009:::20005f45f1b.:head(203789'1033837 osd.3.0:3666949 
dirty|omap|data_digest|omap_digest s 0 uv 1032168 dd  od 
e048d29 alloc_hint [0 0 0])
2019-10-10 11:23:27.971 7ff54c9b0700 -1 log_channel(cluster) log [ERR] 
: 6.327 soid 6:e4c40009:::20005f45f1b.:head : failed to pick 
suitable auth object
2019-10-10 11:23:28.041 7ff54c9b0700 -1 log_channel(cluster) log [ERR] 
: 6.327 shard 19 soid 6:e4c4a042:::20005f389fb.:head : 
omap_digest 0xdd1558b8 != omap_digest 0xcf9af548 from auth oi 
6:e4c4a042:::20005f389fb.:head(203789'1033899 osd.3.0:3667011 
dirty|omap|data_digest|omap_digest s 0 uv 1031358 dd  od 
cf9af548 alloc_hint [0 0 0])
2019-10-10 11:23:28.041 7ff54c9b0700 -1 log_channel(cluster) log [ERR] 
: 6.327 shard 72 soid 6:e4c4a042:::20005f389fb.:head : 
omap_digest 0xdd1558b8 != omap_digest 0xcf9af548 from auth oi 
6:e4c4a042:::20005f389fb.:head(203789'1033899 osd.3.0:3667011 
dirty|omap|data_digest|omap_digest s 0 uv 1031358 dd  od 
cf9af548 alloc_hint [0 0 0])
2019-10-10 11:23:28.041 7ff54c9b0700 -1 log_channel(cluster) log [ERR] 
: 6.327 shard 91 soid 6:e4c4a042:::20005f389fb.:head : 
omap_digest 0xdd1558b8 != omap_digest 0xcf9af548 from auth oi 

[ceph-users] ssd requirements for wal/db

2019-10-04 Thread Kenneth Waegeman

Hi all,

We are thinking about putting our wal/db of hdds/ on ssds. If we would 
put the wal of 4 HDDS on 1 SSD as recommended, what type of SSD would 
suffice?

We were thinking of using SATA Read Intensive 6Gbps 1DWPD SSDs.

Does someone has some experience with this configuration? Would we need 
SAS ssds instead of SATA? And Mixed Use 3WPD instead of Read intensive?



Thank you very much!


Kenneth


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] hanging slow requests: failed to authpin, subtree is being exported

2019-09-23 Thread Kenneth Waegeman

Hi all,

When syncing data with rsync, I'm often getting blocked slow requests, 
which also block access to this path.


2019-09-23 11:25:49.477 7f4f401e8700  0 log_channel(cluster) log [WRN] 
: slow request 31.895478 seconds old, received at 2019-09-23 
11:25:17.598152: client_request(client.38352684:92684 lookup 
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, 
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:26:19.477 7f4f401e8700  0 log_channel(cluster) log [WRN] 
: slow request 61.896079 seconds old, received at 2019-09-23 
11:25:17.598152: client_request(client.38352684:92684 lookup 
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, 
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:27:19.478 7f4f401e8700  0 log_channel(cluster) log [WRN] 
: slow request 121.897268 seconds old, received at 2019-09-23 
11:25:17.598152: client_request(client.38352684:92684 lookup 
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, 
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:29:19.488 7f4f401e8700  0 log_channel(cluster) log [WRN] 
: slow request 241.899467 seconds old, received at 2019-09-23 
11:25:17.598152: client_request(client.38352684:92684 lookup 
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, 
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:33:19.680 7f4f401e8700  0 log_channel(cluster) log [WRN] 
: slow request 482.087927 seconds old, received at 2019-09-23 
11:25:17.598152: client_request(client.38352684:92684 lookup 
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, 
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:36:09.881 7f4f401e8700  0 log_channel(cluster) log [WRN] 
: slow request 32.677511 seconds old, received at 2019-09-23 
11:35:37.217113: client_request(client.38347357:111963 lookup 
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, 
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:36:39.881 7f4f401e8700  0 log_channel(cluster) log [WRN] 
: slow request 62.678132 seconds old, received at 2019-09-23 
11:35:37.217113: client_request(client.38347357:111963 lookup 
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, 
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:37:39.891 7f4f401e8700  0 log_channel(cluster) log [WRN] 
: slow request 122.679273 seconds old, received at 2019-09-23 
11:35:37.217113: client_request(client.38347357:111963 lookup 
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, 
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:39:39.892 7f4f401e8700  0 log_channel(cluster) log [WRN] 
: slow request 242.684667 seconds old, received at 2019-09-23 
11:35:37.217113: client_request(client.38347357:111963 lookup 
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, 
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:41:19.893 7f4f401e8700  0 log_channel(cluster) log [WRN] 
: slow request 962.305681 seconds old, received at 2019-09-23 
11:25:17.598152: client_request(client.38352684:92684 lookup 
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, 
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:43:39.923 7f4f401e8700  0 log_channel(cluster) log [WRN] 
: slow request 482.712888 seconds old, received at 2019-09-23 
11:35:37.217113: client_request(client.38347357:111963 lookup 
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, 
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:51:40.236 7f4f401e8700  0 log_channel(cluster) log [WRN] 
: slow request 963.037049 seconds old, received at 2019-09-23 
11:35:37.217113: client_request(client.38347357:111963 lookup 
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, 
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 11:57:20.308 7f4f401e8700  0 log_channel(cluster) log [WRN] 
: slow request 1922.719287 seconds old, received at 2019-09-23 
11:25:17.598152: client_request(client.38352684:92684 lookup 
#0x100152383ce/vsc42531 2019-09-23 11:25:17.598077 caller_uid=0, 
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 12:07:40.621 7f4f401e8700  0 log_channel(cluster) log [WRN] 
: slow request 1923.409501 seconds old, received at 2019-09-23 
11:35:37.217113: client_request(client.38347357:111963 lookup 
#0x20005b0130c/testing 2019-09-23 11:35:37.217015 caller_uid=0, 
caller_gid=0{0,}) currently failed to authpin, subtree is being exported
2019-09-23 12:29:20.639 7f4f401e8700  0 log_channel(cluster) log [WRN] 
: slow request 3843.057602 seconds old, received at 2019-09-23 
11:25:17.598152: 

Re: [ceph-users] ceph mdss keep on crashing after update to 14.2.3

2019-09-23 Thread Kenneth Waegeman

Hi all,

I was coming from mimic before, but I reverted the mds to 14.2.2 on 
Friday and I didn't observe the issue since then.


Thanks!!

Kenneth

On 20/09/2019 03:43, Yan, Zheng wrote:

On Thu, Sep 19, 2019 at 11:37 PM Dan van der Ster  wrote:

You were running v14.2.2 before?

It seems that that  ceph_assert you're hitting was indeed added
between v14.2.2. and v14.2.3 in this commit
https://github.com/ceph/ceph/commit/12f8b813b0118b13e0cdac15b19ba8a7e127730b

There's a comment in the tracker for that commit which says the
original fix was incomplete
(https://tracker.ceph.com/issues/39987#note-5)

So perhaps nautilus needs
https://github.com/ceph/ceph/pull/28459/commits/0a1e92abf1cfc8bddf526cbf5bceea7b854dcfe8
??


You are right. Sorry for the bug. For now, please got back to 14.2.2
(just mds) or complie ceph-mds from source

Yan, Zheng


Did you already try going back to v14.2.2 (on the MDS's only) ??

-- dan

On Thu, Sep 19, 2019 at 4:59 PM Kenneth Waegeman
 wrote:

Hi all,

I updated our ceph cluster to 14.2.3 yesterday, and today the mds are crashing 
one after another. I'm using two active mds.

I've made a tracker ticket, but I was wondering if someone else also has seen 
this issue yet?

-27> 2019-09-19 15:42:00.196 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8887 lookup 
#0x100166004d4/WindowsPhone-MSVC-CXX.cmake 2019-09-19 15:42:00.203132 
caller_uid=0, caller_gid=0{0,}) v4
-26> 2019-09-19 15:42:00.196 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865372:5815 lookup 
#0x20005a6eb3a/selectable.cpython-37.pyc 2019-09-19 15:42:00.204970 caller_uid=0, 
caller_gid=0{0,}) v4
-25> 2019-09-19 15:42:00.196 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333: lookup 
#0x100166004d4/WindowsPhone.cmake 2019-09-19 15:42:00.206381 caller_uid=0, 
caller_gid=0{0,}) v4
-24> 2019-09-19 15:42:00.206 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8889 lookup 
#0x100166004d4/WindowsStore-MSVC-C.cmake 2019-09-19 15:42:00.209703 caller_uid=0, 
caller_gid=0{0,}) v4
-23> 2019-09-19 15:42:00.206 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8890 lookup 
#0x100166004d4/WindowsStore-MSVC-CXX.cmake 2019-09-19 15:42:00.213200 
caller_uid=0, caller_gid=0{0,}) v4
-22> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8891 lookup 
#0x100166004d4/WindowsStore.cmake 2019-09-19 15:42:00.216577 caller_uid=0, 
caller_gid=0{0,}) v4
-21> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8892 lookup 
#0x100166004d4/Xenix.cmake 2019-09-19 15:42:00.220230 caller_uid=0, 
caller_gid=0{0,}) v4
-20> 2019-09-19 15:42:00.216 7f0369aeb700  2 mds.1.cache Memory usage:  
total 4603496, rss 4167920, heap 323836, baseline 323836, 501 / 1162471 inodes 
have caps, 506 caps, 0.00043528 caps per inode
-19> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log _submit_thread 
30520209420029~9062 : EUpdate scatter_writebehind [metablob 0x1000bd8ac7b, 2 dirs]
-18> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log _submit_thread 
30520209429111~10579 : EUpdate scatter_writebehind [metablob 0x1000bf26309, 9 dirs]
-17> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log _submit_thread 
30520209439710~2305 : EUpdate scatter_writebehind [metablob 0x1000bf2745b.001*, 2 
dirs]
-16> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log _submit_thread 
30520209442035~1845 : EUpdate scatter_writebehind [metablob 0x1000c233753, 2 dirs]
-15> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8893 lookup 
#0x100166004d4/eCos.cmake 2019-09-19 15:42:00.223360 caller_uid=0, 
caller_gid=0{0,}) v4
-14> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865319:2381 lookup 
#0x1001172f39d/microsoft-cp1251 2019-09-19 15:42:00.224940 caller_uid=0, 
caller_gid=0{0,}) v4
-13> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8894 lookup 
#0x100166004d4/gas.cmake 2019-09-19 15:42:00.226624 caller_uid=0, 
caller_gid=0{0,}) v4
-12> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865319:2382 readdir #0x1001172f3d7 
2019-09-19 15:42:00.228673 caller_uid=0, caller_gid=0{0,}) v4
-11> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8895 lookup 
#0x100166004d4/kFreeBSD.cmake 2019-09-19 15:42:00.229668 caller_uid=0, 
caller_gid=0{0,}) v4
-10> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8896 lookup 
#0x100166004d4/syllable.cmake 2019-09-19 15:42:00.232746 c

Re: [ceph-users] ceph mdss keep on crashing after update to 14.2.3

2019-09-19 Thread Kenneth Waegeman

I forgot to mention the tracker issue: https://tracker.ceph.com/issues/41935

On 19/09/2019 16:59, Kenneth Waegeman wrote:


Hi all,

I updated our ceph cluster to 14.2.3 yesterday, and today the mds are 
crashing one after another. I'm using two active mds.


I've made a tracker ticket, but I was wondering if someone else also 
has seen this issue yet?



-27> 2019-09-19 15:42:00.196 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8887 lookup 
#0x100166004d4/WindowsPhone-MSVC-CXX.cmake 2019-09-19 15:42:00.203132 
caller_uid=0, caller_gid=0{0,}) v4
-26> 2019-09-19 15:42:00.196 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865372:5815 lookup 
#0x20005a6eb3a/selectable.cpython-37.pyc 2019-09-19 15:42:00.204970 caller_uid=0, 
caller_gid=0{0,}) v4
-25> 2019-09-19 15:42:00.196 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333: lookup 
#0x100166004d4/WindowsPhone.cmake 2019-09-19 15:42:00.206381 caller_uid=0, 
caller_gid=0{0,}) v4
-24> 2019-09-19 15:42:00.206 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8889 lookup 
#0x100166004d4/WindowsStore-MSVC-C.cmake 2019-09-19 15:42:00.209703 caller_uid=0, 
caller_gid=0{0,}) v4
-23> 2019-09-19 15:42:00.206 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8890 lookup 
#0x100166004d4/WindowsStore-MSVC-CXX.cmake 2019-09-19 15:42:00.213200 
caller_uid=0, caller_gid=0{0,}) v4
-22> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8891 lookup 
#0x100166004d4/WindowsStore.cmake 2019-09-19 15:42:00.216577 caller_uid=0, 
caller_gid=0{0,}) v4
-21> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8892 lookup 
#0x100166004d4/Xenix.cmake 2019-09-19 15:42:00.220230 caller_uid=0, 
caller_gid=0{0,}) v4
-20> 2019-09-19 15:42:00.216 7f0369aeb700  2 mds.1.cache Memory usage:  
total 4603496, rss 4167920, heap 323836, baseline 323836, 501 / 1162471 inodes 
have caps, 506 caps, 0.00043528 caps per inode
-19> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log _submit_thread 
30520209420029~9062 : EUpdate scatter_writebehind [metablob 0x1000bd8ac7b, 2 dirs]
-18> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log _submit_thread 
30520209429111~10579 : EUpdate scatter_writebehind [metablob 0x1000bf26309, 9 dirs]
-17> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log _submit_thread 
30520209439710~2305 : EUpdate scatter_writebehind [metablob 0x1000bf2745b.001*, 2 
dirs]
-16> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log _submit_thread 
30520209442035~1845 : EUpdate scatter_writebehind [metablob 0x1000c233753, 2 dirs]
-15> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8893 lookup 
#0x100166004d4/eCos.cmake 2019-09-19 15:42:00.223360 caller_uid=0, 
caller_gid=0{0,}) v4
-14> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865319:2381 lookup 
#0x1001172f39d/microsoft-cp1251 2019-09-19 15:42:00.224940 caller_uid=0, 
caller_gid=0{0,}) v4
-13> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8894 lookup 
#0x100166004d4/gas.cmake 2019-09-19 15:42:00.226624 caller_uid=0, 
caller_gid=0{0,}) v4
-12> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865319:2382 readdir #0x1001172f3d7 
2019-09-19 15:42:00.228673 caller_uid=0, caller_gid=0{0,}) v4
-11> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8895 lookup 
#0x100166004d4/kFreeBSD.cmake 2019-09-19 15:42:00.229668 caller_uid=0, 
caller_gid=0{0,}) v4
-10> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8896 lookup 
#0x100166004d4/syllable.cmake 2019-09-19 15:42:00.232746 caller_uid=0, 
caller_gid=0{0,}) v4
 -9> 2019-09-19 15:42:00.236 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8897 readdir #0x10016601379 
2019-09-19 15:42:00.240672 caller_uid=0, caller_gid=0{0,}) v4
 -8> 2019-09-19 15:42:00.236 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865356:3604574 readdir 
#0x290d630 2019-09-19 15:42:00.241832 caller_uid=0, caller_gid=0{0,}) v4
 -7> 2019-09-19 15:42:00.266 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865356:3604575 readdir 
#0x290d631 2019-09-19 15:42:00.272158 caller_uid=0, caller_gid=0{0,}) v4
 -6> 2019-09-19 15:42:00.326 7f03652e2700  5 mds.1.log _submit_thread 
30520209443900~3089 : EUpdate scatter_writebehind [metablob 0x20005af5c63, 3 dirs]
 -5> 2019-09-19 

[ceph-users] ceph mdss keep on crashing after update to 14.2.3

2019-09-19 Thread Kenneth Waegeman

Hi all,

I updated our ceph cluster to 14.2.3 yesterday, and today the mds are 
crashing one after another. I'm using two active mds.


I've made a tracker ticket, but I was wondering if someone else also has 
seen this issue yet?



-27> 2019-09-19 15:42:00.196 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8887 lookup 
#0x100166004d4/WindowsPhone-MSVC-CXX.cmake 2019-09-19 15:42:00.203132 
caller_uid=0, caller_gid=0{0,}) v4
-26> 2019-09-19 15:42:00.196 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865372:5815 lookup 
#0x20005a6eb3a/selectable.cpython-37.pyc 2019-09-19 15:42:00.204970 caller_uid=0, 
caller_gid=0{0,}) v4
-25> 2019-09-19 15:42:00.196 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333: lookup 
#0x100166004d4/WindowsPhone.cmake 2019-09-19 15:42:00.206381 caller_uid=0, 
caller_gid=0{0,}) v4
-24> 2019-09-19 15:42:00.206 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8889 lookup 
#0x100166004d4/WindowsStore-MSVC-C.cmake 2019-09-19 15:42:00.209703 caller_uid=0, 
caller_gid=0{0,}) v4
-23> 2019-09-19 15:42:00.206 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8890 lookup 
#0x100166004d4/WindowsStore-MSVC-CXX.cmake 2019-09-19 15:42:00.213200 
caller_uid=0, caller_gid=0{0,}) v4
-22> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8891 lookup 
#0x100166004d4/WindowsStore.cmake 2019-09-19 15:42:00.216577 caller_uid=0, 
caller_gid=0{0,}) v4
-21> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8892 lookup 
#0x100166004d4/Xenix.cmake 2019-09-19 15:42:00.220230 caller_uid=0, 
caller_gid=0{0,}) v4
-20> 2019-09-19 15:42:00.216 7f0369aeb700  2 mds.1.cache Memory usage:  
total 4603496, rss 4167920, heap 323836, baseline 323836, 501 / 1162471 inodes 
have caps, 506 caps, 0.00043528 caps per inode
-19> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log _submit_thread 
30520209420029~9062 : EUpdate scatter_writebehind [metablob 0x1000bd8ac7b, 2 dirs]
-18> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log _submit_thread 
30520209429111~10579 : EUpdate scatter_writebehind [metablob 0x1000bf26309, 9 dirs]
-17> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log _submit_thread 
30520209439710~2305 : EUpdate scatter_writebehind [metablob 0x1000bf2745b.001*, 2 
dirs]
-16> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log _submit_thread 
30520209442035~1845 : EUpdate scatter_writebehind [metablob 0x1000c233753, 2 dirs]
-15> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8893 lookup 
#0x100166004d4/eCos.cmake 2019-09-19 15:42:00.223360 caller_uid=0, 
caller_gid=0{0,}) v4
-14> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865319:2381 lookup 
#0x1001172f39d/microsoft-cp1251 2019-09-19 15:42:00.224940 caller_uid=0, 
caller_gid=0{0,}) v4
-13> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8894 lookup 
#0x100166004d4/gas.cmake 2019-09-19 15:42:00.226624 caller_uid=0, 
caller_gid=0{0,}) v4
-12> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865319:2382 readdir #0x1001172f3d7 
2019-09-19 15:42:00.228673 caller_uid=0, caller_gid=0{0,}) v4
-11> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8895 lookup 
#0x100166004d4/kFreeBSD.cmake 2019-09-19 15:42:00.229668 caller_uid=0, 
caller_gid=0{0,}) v4
-10> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8896 lookup 
#0x100166004d4/syllable.cmake 2019-09-19 15:42:00.232746 caller_uid=0, 
caller_gid=0{0,}) v4
 -9> 2019-09-19 15:42:00.236 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865333:8897 readdir #0x10016601379 
2019-09-19 15:42:00.240672 caller_uid=0, caller_gid=0{0,}) v4
 -8> 2019-09-19 15:42:00.236 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865356:3604574 readdir 
#0x290d630 2019-09-19 15:42:00.241832 caller_uid=0, caller_gid=0{0,}) v4
 -7> 2019-09-19 15:42:00.266 7f036c2f0700  4 mds.1.server 
handle_client_request client_request(client.37865356:3604575 readdir 
#0x290d631 2019-09-19 15:42:00.272158 caller_uid=0, caller_gid=0{0,}) v4
 -6> 2019-09-19 15:42:00.326 7f03652e2700  5 mds.1.log _submit_thread 
30520209443900~3089 : EUpdate scatter_writebehind [metablob 0x20005af5c63, 3 dirs]
 -5> 2019-09-19 15:42:00.326 7f03652e2700  5 mds.1.log _submit_thread 
30520209447009~10579 : EUpdate scatter_writebehind [metablob 0x1000bf26309, 9 dirs]
 -4> 2019-09-19 15:42:00.326 7f03652e2700  5 mds.1.log 

Re: [ceph-users] regurlary 'no space left on device' when deleting on cephfs

2019-09-10 Thread Kenneth Waegeman

Hi Paul, all,

Thanks! But I don't seem to find how to debug the purge queue. When I 
check the purge queue, I get these numbers:


[root@mds02 ~]# ceph daemon mds.mds02 perf dump | grep -E 'purge|pq'
    "purge_queue": {
    "pq_executing_ops": 0,
    "pq_executing": 0,
    "pq_executed": 469026

[root@mds03 ~]# ceph daemon mds.mds03 perf dump | grep -E 'purge|pq'
    "purge_queue": {
    "pq_executing_ops": 0,
    "pq_executing": 0,
    "pq_executed": 0

But even after more than 10 minutes these numbers are still the same, 
but I'm not yet able to delete anything.


What bothers me most is that while I can't delete anything, ceph cluster 
is still healthy - no warnings, and there is nothing in the mds logs.. 
(running ceph 13.2.6)


Thanks again!

Kenneth

On 06/09/2019 16:21, Paul Emmerich wrote:

Yeah, no ENOSPC error code on deletion is a little bit unintuitive,
but what it means is: the purge queue is full.
You've already told the MDS to purge faster.

Not sure how to tell it to increase the maximum backlog for
deletes/purges, though, but you should be able to find something with
the search term "purge queue". :)


Paul


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] regurlary 'no space left on device' when deleting on cephfs

2019-09-06 Thread Kenneth Waegeman

Hi all,

We are using cephfs to make a copy of another fs via rsync, and also use 
snapshots.


I'm seeing this issue now and then when I try to delete files on cephFS:

|[root@osd001 ~]# rm -f /mnt/ceph/backups/osd00*||
||rm: cannot remove 
‘/mnt/ceph/backups/osd001.gigalith.os-3eea7740.1542483’: No space left 
on device||
||rm: cannot remove 
‘/mnt/ceph/backups/osd001.gigalith.os-b2f21740.1557247’: No space left 
on device||
||rm: cannot remove 
‘/mnt/ceph/backups/osd001.gigalith.os-ca3be740.1549780’: No space left 
on device||
||rm: cannot remove 
‘/mnt/ceph/backups/osd002.gigalith.os-1b437740.1173950’: No space left 
on device||
||rm: cannot remove 
‘/mnt/ceph/backups/osd002.gigalith.os-92186740.1169503’: No space left 
on device||
||rm: cannot remove 
‘/mnt/ceph/backups/osd002.gigalith.os-e9260740.1178280’: No space left 
on device||
||rm: cannot remove 
‘/mnt/ceph/backups/osd003.gigalith.os-2dec5740.2025571’: No space left 
on device||
||rm: cannot remove 
‘/mnt/ceph/backups/osd003.gigalith.os-e5f94740.2029993’: No space left 
on device||
||rm: cannot remove 
‘/mnt/ceph/backups/osd004.gigalith.os-f4a9740.364609’: No space left on 
device||

|


The cluster is healthy at this moment, and we have certainly enough 
space (see also osd df below)



[root@osd001 ~]# ceph -s
  cluster:
    id: 92bfcf0a-1d39-43b3-b60f-44f01b630e47
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum mds01,mds02,mds03
    mgr: mds02(active), standbys: mds03, mds01
    mds: ceph_fs-2/2/2 up {0=mds01=up:active,1=mds02=up:active}, 1 
up:standby

    osd: 536 osds: 536 up, 536 in

  data:
    pools:   3 pools, 3328 pgs
    objects: 451.5 M objects, 762 TiB
    usage:   1.1 PiB used, 2.0 PiB / 3.2 PiB avail
    pgs: 1280 active+clean
 1138 active+clean+snaptrim_wait
 899  active+clean+snaptrim
 9    active+clean+scrubbing+deep
 2    active+clean+scrubbing

  io:
    client:   0 B/s wr, 0 op/s rd, 0 op/s wr


There is also nothing in the mds log.

We tuned the mds already before:

|[mds]||
||mds_cache_memory_limit=21474836480||
||mds_log_max_expiring=200||
||mds_log_max_segments=200||
||mds_max_purge_files=2560||
||mds_max_purge_ops=327600||
||mds_max_purge_ops_per_pg=20|


I tried restarting the mds, flushing the mds journals, and do a remount 
on the client, but that does not help - after some time it just works 
again..



What can I do to debug / tune  this further ?


Thanks!!

Kenneth



ceph osd df :


ID  CLASS WEIGHT  REWEIGHT SIZE    USE  AVAIL   %USE  VAR PGS
546   ssd 0.14699  1.0 151 GiB   67 GiB  84 GiB 44.30 1.23 512
547   ssd 0.14699  1.0 151 GiB   68 GiB  83 GiB 45.27 1.26 512
  0   ssd 0.14699  1.0 151 GiB   66 GiB  85 GiB 43.88 1.22 515
  1   ssd 0.14699  1.0 151 GiB   66 GiB  85 GiB 43.65 1.21 509
  2   ssd 0.14699  1.0 151 GiB   67 GiB  84 GiB 44.53 1.24 511
  3   ssd 0.14699  1.0 151 GiB   67 GiB  84 GiB 44.39 1.23 513
540  fast 0.14799  1.0 151 GiB  3.1 GiB 148 GiB  2.05 0.06   0
541  fast 0.14799  1.0 151 GiB  3.1 GiB 148 GiB  2.05 0.06   0
528   hdd 3.63899  1.0 3.6 TiB  1.4 TiB 2.2 TiB 38.50 1.07  31
529   hdd 3.63899  1.0 3.6 TiB  2.1 TiB 1.6 TiB 56.96 1.58  42
530   hdd 3.63899  1.0 3.6 TiB  1.5 TiB 2.2 TiB 39.85 1.11  31
531   hdd 3.63899  1.0 3.6 TiB  1.5 TiB 2.2 TiB 39.91 1.11  30
532   hdd 3.63899  1.0 3.6 TiB  1.6 TiB 2.1 TiB 42.75 1.19  34
533   hdd 3.63899  1.0 3.6 TiB  1.3 TiB 2.3 TiB 35.60 0.99  31
534   hdd 3.63899  1.0 3.6 TiB  2.2 TiB 1.4 TiB 61.22 1.70  46
535   hdd 3.63899  1.0 3.6 TiB  1.3 TiB 2.3 TiB 37.02 1.03  33
536   hdd 3.63899  1.0 3.6 TiB  1.5 TiB 2.2 TiB 39.89 1.11  30
537   hdd 3.63899  1.0 3.6 TiB  1.1 TiB 2.5 TiB 31.35 0.87  24
538   hdd 3.63899  1.0 3.6 TiB  2.3 TiB 1.4 TiB 62.68 1.74  53
539   hdd 3.63899  1.0 3.6 TiB  1.9 TiB 1.8 TiB 51.27 1.43  40
542   hdd 3.63899  1.0 3.6 TiB  2.2 TiB 1.5 TiB 59.81 1.66  46
543   hdd 3.63899  1.0 3.6 TiB  1.5 TiB 2.1 TiB 41.27 1.15  35
544   hdd 3.63899  1.0 3.6 TiB  1.3 TiB 2.3 TiB 35.58 0.99  28
545   hdd 3.63899  1.0 3.6 TiB  1.2 TiB 2.4 TiB 32.73 0.91  28
520  fast 0.14799  1.0 151 GiB  3.8 GiB 147 GiB  2.53 0.07   0
522  fast 0.14799  1.0 151 GiB  3.8 GiB 147 GiB  2.53 0.07   0
496   hdd 3.63899  1.0 3.6 TiB  1.8 TiB 1.9 TiB 48.41 1.35  35
498   hdd 3.63899  1.0 3.6 TiB  1.1 TiB 2.6 TiB 29.88 0.83  27
500   hdd 3.63899  1.0 3.6 TiB  2.0 TiB 1.6 TiB 55.49 1.54  43
502   hdd 3.63899  1.0 3.6 TiB  1.4 TiB 2.2 TiB 38.48 1.07  31
504   hdd 3.63899  1.0 3.6 TiB  1.1 TiB 2.6 TiB 29.90 0.83  24
510   hdd 3.63899  1.0 3.6 TiB  1.2 TiB 2.4 TiB 34.16 0.95  28
512   hdd 3.63899  1.0 3.6 TiB  955 GiB 2.7 TiB 25.64 0.71  22
514   hdd 3.63899  1.0 3.6 TiB  1.3 TiB 2.3 TiB 37.03 1.03  31
516   hdd 3.63899  1.0 3.6 TiB  1.6 TiB 2.1 TiB 42.73 1.19  37
518   hdd 3.63899  1.0 3.6 TiB  1.8 TiB 1.9 TiB 48.39 1.35  38
524   hdd 3.63899  1.0 

[ceph-users] cephfs deleting files No space left on device

2019-05-10 Thread Kenneth Waegeman

Hi all,

I am seeing issues on cephfs running 13.2.5 when deleting files:

[root@osd006 ~]# rm /mnt/ceph/backups/osd006.gigalith.os-2b5a3740.1326700
rm: remove regular empty file 
‘/mnt/ceph/backups/osd006.gigalith.os-2b5a3740.1326700’? y
rm: cannot remove 
‘/mnt/ceph/backups/osd006.gigalith.os-2b5a3740.1326700’: No space left 
on device


few minutes later, I can remove it without problem. This happens 
especially when there are a lot of files deleted somewhere on the 
filesystem around the same time.


We already have tuned our mds config:

[mds]
mds_cache_memory_limit=10737418240
mds_log_max_expiring=200
mds_log_max_segments=200
mds_max_purge_files=2560
mds_max_purge_ops=327600
mds_max_purge_ops_per_pg=20

ceph -s is reporting everything clean, and the file system space usage 
is less than 50%, also no full osds or anything.


Is there a way to further debug what the bottleneck is when removing 
files that gives this 'no space left on device' error?



Thank you very much!

Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unexplainable high memory usage OSD with BlueStore

2019-05-06 Thread Kenneth Waegeman

Hi all,

I am also switching osds to the new bitmap allocater on 13.2.5. That 
went quite fluently for now, except for one OSD that keeps segfaulting 
when I enable the bitmap allocator. Each time I disable bitmap allocater 
on that again, osd is ok again. Segfault error of the OSD:




--- begin dump of recent events ---
  -319> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command perfcounters_dump hook 0x55b2155b60d0
  -318> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command 1 hook 0x55b2155b60d0
  -317> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command perf dump hook 0x55b2155b60d0
  -316> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command perfcounters_schema hook 0x55b2155b60d0
  -315> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command perf histogram dump hook 0x55b2155b60d0
  -314> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command 2 hook 0x55b2155b60d0
  -313> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command perf schema hook 0x55b2155b60d0
  -312> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command perf histogram schema hook 0x55b2155b60d0
  -311> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command perf reset hook 0x55b2155b60d0
  -310> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command config show hook 0x55b2155b60d0
  -309> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command config help hook 0x55b2155b60d0
  -308> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command config set hook 0x55b2155b60d0
  -307> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command config unset hook 0x55b2155b60d0
  -306> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command config get hook 0x55b2155b60d0
  -305> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command config diff hook 0x55b2155b60d0
  -304> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command config diff get hook 0x55b2155b60d0
  -303> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command log flush hook 0x55b2155b60d0
  -302> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command log dump hook 0x55b2155b60d0
  -301> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command log reopen hook 0x55b2155b60d0
  -300> 2019-05-06 15:11:45.172 7f28f4321d80  5 asok(0x55b2155de5a0) 
register_command dump_mempools hook 0x55b2155ec2c8
  -299> 2019-05-06 15:11:45.182 7f28f4321d80 10 monclient: 
get_monmap_and_config
  -298> 2019-05-06 15:11:45.222 7f28f4321d80 10 monclient: 
build_initial_monmap
  -297> 2019-05-06 15:11:45.222 7f28e3a19700  2 Event(0x55b215911080 
nevent=5000 time_id=1).set_owner idx=1 owner=139813594437376
  -296> 2019-05-06 15:11:45.222 7f28e421a700  2 Event(0x55b215910c80 
nevent=5000 time_id=1).set_owner idx=0 owner=139813602830080
  -295> 2019-05-06 15:11:45.222 7f28e3218700  2 Event(0x55b215911880 
nevent=5000 time_id=1).set_owner idx=2 owner=139813586044672

  -294> 2019-05-06 15:11:45.222 7f28f4321d80  1  Processor -- start
  -293> 2019-05-06 15:11:45.222 7f28f4321d80  1 -- - start start
  -292> 2019-05-06 15:11:45.222 7f28f4321d80 10 monclient: init
  -291> 2019-05-06 15:11:45.222 7f28f4321d80  5 adding auth protocol: 
cephx
  -290> 2019-05-06 15:11:45.222 7f28f4321d80 10 monclient: 
auth_supported 2 method cephx
  -289> 2019-05-06 15:11:45.222 7f28f4321d80  2 auth: KeyRing::load: 
loaded key file /var/lib/ceph/osd/ceph-3/keyring
  -288> 2019-05-06 15:11:45.222 7f28f4321d80 10 monclient: 
_reopen_session rank -1
  -287> 2019-05-06 15:11:45.222 7f28f4321d80 10 monclient(hunting): 
picked mon.noname-c con 0x55b2159e2600 addr 10.141.16.3:6789/0
  -286> 2019-05-06 15:11:45.222 7f28f4321d80 10 monclient(hunting): 
picked mon.noname-b con 0x55b2159e2c00 addr 10.141.16.2:6789/0
  -285> 2019-05-06 15:11:45.222 7f28f4321d80  1 -- - --> 
10.141.16.2:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- 
0x55b2155b1200 con 0
  -284> 2019-05-06 15:11:45.222 7f28f4321d80  1 -- - --> 
10.141.16.3:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- 
0x55b2155b1440 con 0
  -283> 2019-05-06 15:11:45.222 7f28f4321d80 10 monclient(hunting): 
_renew_subs
  -282> 2019-05-06 15:11:45.222 7f28f4321d80 10 monclient(hunting): 
authenticate will time out at 2019-05-06 15:16:45.237660
  -281> 2019-05-06 15:11:45.222 7f28e3a19700  1 -- 
10.141.16.3:0/3652030958 learned_addr learned my addr 
10.141.16.3:0/3652030958
  -280> 2019-05-06 15:11:45.222 7f28e3a19700  2 -- 
10.141.16.3:0/3652030958 >> 10.141.16.3:6789/0 conn(0x55b2159e2600 :-1 
s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=0)._process_connection 
got newly_acked_seq 0 vs out_seq 0
  -279> 2019-05-06 15:11:45.222 7f28e3218700  2 -- 

[ceph-users] slow ops after cephfs snapshot removal

2018-11-09 Thread Kenneth Waegeman

Hi all,

On Mimic 13.2.1, we are seeing blocked ops on cephfs after removing some 
snapshots:


[root@osd001 ~]# ceph -s
  cluster:
    id: 92bfcf0a-1d39-43b3-b60f-44f01b630e47
    health: HEALTH_WARN
    5 slow ops, oldest one blocked for 1162 sec, mon.mds03 has 
slow ops


  services:
    mon: 3 daemons, quorum mds01,mds02,mds03
    mgr: mds02(active), standbys: mds03, mds01
    mds: ceph_fs-2/2/2 up  {0=mds03=up:active,1=mds01=up:active}, 1 
up:standby

    osd: 544 osds: 544 up, 544 in

  io:
    client:   5.4 KiB/s wr, 0 op/s rd, 0 op/s wr

[root@osd001 ~]# ceph health detail
HEALTH_WARN 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has 
slow ops

SLOW_OPS 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has slow ops

[root@osd001 ~]# ceph -v
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic 
(stable)


Is this a known issue?

Cheers,

Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Snapshots in Mimic

2018-07-31 Thread Kenneth Waegeman

Thanks David and John,

That sounds logical now. When I did read "To make a snapshot on 
directory “/1/2/3/”, the client invokes “mkdir” on “/1/2/3/.snap” 
directory (http://docs.ceph.com/docs/master/dev/cephfs-snapshots/)" it 
didn't come to mind I should create subdirectory immediately.


Thanks, it works now!

K


On 31/07/18 17:06, John Spray wrote:

On Tue, Jul 31, 2018 at 3:45 PM Kenneth Waegeman
 wrote:

Hi all,

I updated an existing Luminous cluster to Mimic 13.2.1. All daemons were
updated, so I did ceph osd require-osd-release mimic, so everything
seems up to date.

I want to try the snapshots in Mimic, since this should be stable, so i ran:

[root@osd2801 alleee]# ceph fs set cephfs allow_new_snaps true
enabled new snapshots

Now, when I try to create a snapshot, it is not working:

[root@osd2801 ~]# mkdir /mnt/bla/alleee/aaas
[root@osd2801 ~]# mkdir /mnt/bla/alleee/aaas/.snap
mkdir: cannot create directory ‘/mnt/bla/alleee/aaas/.snap’: File exists

I tried this using ceph-fuse and the kernel client, but always get the
same response.

The .snap directory always exists.  To create a snapshot, you create
subdirectory of .snap with a name of your choice.

John


Should I enable something else to get snapshots working ?


Thank you!


Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS Snapshots in Mimic

2018-07-31 Thread Kenneth Waegeman

Hi all,

I updated an existing Luminous cluster to Mimic 13.2.1. All daemons were 
updated, so I did ceph osd require-osd-release mimic, so everything 
seems up to date.


I want to try the snapshots in Mimic, since this should be stable, so i ran:

[root@osd2801 alleee]# ceph fs set cephfs allow_new_snaps true
enabled new snapshots

Now, when I try to create a snapshot, it is not working:

[root@osd2801 ~]# mkdir /mnt/bla/alleee/aaas
[root@osd2801 ~]# mkdir /mnt/bla/alleee/aaas/.snap
mkdir: cannot create directory ‘/mnt/bla/alleee/aaas/.snap’: File exists

I tried this using ceph-fuse and the kernel client, but always get the 
same response.


Should I enable something else to get snapshots working ?


Thank you!


Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-30 Thread Kenneth Waegeman

kzal t maar eens testen :)


On 30/07/18 10:54, Nathan Cutler wrote:
for all others on this list, it might also be helpful to know which 
setups are likely affected.
Does this only occur for Filestore disks, i.e. if ceph-volume has 
taken over taking care of these?

Does it happen on every RHEL 7.5 system?


It affects all OSDs managed by ceph-disk on all RHEL systems (but not 
on CentOS), regardless of whether they are filestore or bluestore.


We're still on 13.2.0 here and ceph-detect-init works fine on our 
CentOS 7.5 systems (it just echoes "systemd").

We're on Bluestore.
Should we hold off on an upgrade, or are we unaffected?


The regression does not affect CentOS - only RHEL.

Nathan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph mount nofail option

2018-03-12 Thread Kenneth Waegeman

Hi all,

Is there a way to mount ceph kernel client with the nofail option ?

I get an invalid argument when trying to mount ceph with nofail option, 
in fstab / mount
mon01,mon02,mon03:/ /mnt/ceph ceph 
name=cephfs,secretfile=/etc/ceph/secret,noatime,nofail 0 0

or
[root@osd003 ~]# mount -t ceph mon01,mon02,mon03:/ /mnt/ceph -o 
name=admin,secretfile=/etc/ceph/secret,noatime,nofail

mount error 22 = Invalid argument

Without nofail the mount succeeds.

Tested using  centos7.4 3.10.0-693.17.1.el7 kernel and ceph cluster 12.2.4

Many thanks!


Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS Client Capabilities questions

2018-03-07 Thread Kenneth Waegeman

Hi all,

I am playing with limiting client access to certain subdirectories of 
cephfs running latest 12.2.4 and latest centos 7.4 kernel, both using 
kernel client and fuse


I am following http://docs.ceph.com/docs/luminous/cephfs/client-auth/:

/To completely restrict the client to the //|bar|//directory, omit the 
root directory/


//

///cephfsauthorizecephfsclient//.//foo///barrw///

When I mount this directory with fuse, this works. When I try to mount 
the subdirectory directly with the kernel client, I get


/mount error 13 = Permission denied /

This only seems to work when the root is readable.

--> Is there a way to mount subdirectory with kernel client when parent 
in cephfs is not readable ?



Then I checked the data pool with rados, but I can list/get/.. every 
object in the data pool using the client.foo key.


I saw in the docs of master 
http://docs.ceph.com/docs/master/cephfs/client-auth/ that you can add a 
tag cephfs, but if I add this I can't write anything to cephfs anymore, 
so I guess this is not yet supported in luminous.


--> Is there a way to limit the cephfs user to his data only (through 
cephfs) instead of being able to do everything on the pool, without 
needing a pool for every single cephfs client?




Thanks!!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] force scrubbing

2018-03-01 Thread Kenneth Waegeman

Hi,

Still seeing this on Luminous 12.2.2:

When I do ceph pg deep-scrub on the pg or ceph osd deep-scrub on the 
primary osd, I get the message


instructing pg 5.238 on osd.356 to deep-scrub

But nothing happens on that OSD. I waited a day, but the timestamp I see 
in ceph pg dump hasn't changed.


Any clues?

Thanks!!

K

On 13/11/17 10:01, Kenneth Waegeman wrote:

Hi all,


Is there a way to force scrub a pg of an erasure coded pool?

I tried  ceph pg deep-scrub 5.4c7, but after a week it still hasn't 
scrubbed the pg (last scrub timestamp not changed)


Thanks!


Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] removing cache of ec pool (bluestore) with ec_overwrites enabled

2018-02-14 Thread Kenneth Waegeman

Hi all,

I'm trying to remove the cache from a erasure coded pool where all osds 
are bluestore osds and allow_ec_overwrites is true. I followed the steps 
on http://docs.ceph.com/docs/master/rados/operations/cache-tiering/, but 
with the remove-overlay step I'm getting a EBUSY error:


root@ceph001 ~]# ceph osd tier cache-mode cache forward 
--yes-i-really-mean-it

set cache-mode for pool 'cache' to forward

[root@ceph001 ~]# rados -p cache cache-flush-evict-all

[root@ceph001 ~]# rados -p cache ls

[root@ceph001 ~]# ceph osd tier remove-overlay ecdata
Error EBUSY: pool 'ecdata' is in use by CephFS via its tier
[root@ceph001 ~]# ceph osd pool set ecdata allow_ec_overwrites true
set pool 7 allow_ec_overwrites to true
[root@ceph001 ~]# ceph osd tier remove-overlay ecdata
Error EBUSY: pool 'ecdata' is in use by CephFS via its tier

I tried this with an fs with replicated pool as backend, and this worked.

Is there another thing I should set to make this possisble?

I'm on Luminous 12.2.2


Thanks!

Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mons segmentation faults New 12.2.2 cluster

2018-01-12 Thread Kenneth Waegeman

Hi all,

I installed a new Luminous 12.2.2 cluster. The monitors were up at 
first, but quickly started failing, segfaulting.


I only installed some mons, mgr, mds with ceph-deploy and osds with ceph 
volume. No pools or fs were created yet.


When I start all mons again, there is a short window i can see the 
cluster state:



[root@ceph001 ~]# ceph status
  cluster:
    id: 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
    health: HEALTH_WARN
    1/3 mons down, quorum ceph002,ceph003

  services:
    mon: 3 daemons, quorum ceph002,ceph003, out of quorum: ceph001
    mgr: ceph001(active), standbys: ceph002, ceph003
    osd: 7 osds: 4 up, 4 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   4223 MB used, 14899 GB / 14904 GB avail
    pgs:


But this is only until I lose quorum again.

What could be the problem here?


Thanks!!

Kenneth


2018-01-12 13:08:36.912832 7f794f513e80  0 set uid:gid to 167:167 
(ceph:ceph)
2018-01-12 13:08:36.912859 7f794f513e80  0 ceph version 12.2.2 
(cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process 
(unknown), pid 28726
2018-01-12 13:08:36.913016 7f794f513e80  0 pidfile_write: ignore empty 
--pid-file
2018-01-12 13:08:36.951556 7f794f513e80  0 load: jerasure load: lrc 
load: isa
2018-01-12 13:08:36.951703 7f794f513e80  0  set rocksdb option 
compression = kNoCompression
2018-01-12 13:08:36.951716 7f794f513e80  0  set rocksdb option 
write_buffer_size = 33554432
2018-01-12 13:08:36.951742 7f794f513e80  0  set rocksdb option 
compression = kNoCompression
2018-01-12 13:08:36.951749 7f794f513e80  0  set rocksdb option 
write_buffer_size = 33554432

2018-01-12 13:08:36.951936 7f794f513e80  4 rocksdb: RocksDB version: 5.4.0

2018-01-12 13:08:36.951947 7f794f513e80  4 rocksdb: Git sha 
rocksdb_build_git_sha:@0@

2018-01-12 13:08:36.951951 7f794f513e80  4 rocksdb: Compile date Nov 30 2017
2018-01-12 13:08:36.951954 7f794f513e80  4 rocksdb: DB SUMMARY

2018-01-12 13:08:36.952011 7f794f513e80  4 rocksdb: CURRENT file: CURRENT

2018-01-12 13:08:36.952016 7f794f513e80  4 rocksdb: IDENTITY file:  IDENTITY

2018-01-12 13:08:36.952020 7f794f513e80  4 rocksdb: MANIFEST file:  
MANIFEST-64 size: 219 Bytes


2018-01-12 13:08:36.952023 7f794f513e80  4 rocksdb: SST files in 
/var/lib/ceph/mon/ceph-ceph001/store.db dir, Total Num: 3, files: 
48.sst 50.sst 60.sst


2018-01-12 13:08:36.952025 7f794f513e80  4 rocksdb: Write Ahead Log file 
in /var/lib/ceph/mon/ceph-ceph001/store.db: 65.log size: 0 ;


2018-01-12 13:08:36.952028 7f794f513e80  4 
rocksdb: Options.error_if_exists: 0
2018-01-12 13:08:36.952029 7f794f513e80  4 
rocksdb:   Options.create_if_missing: 0
2018-01-12 13:08:36.952031 7f794f513e80  4 
rocksdb: Options.paranoid_checks: 1
2018-01-12 13:08:36.952032 7f794f513e80  4 
rocksdb: Options.env: 0x5617a10fa040
2018-01-12 13:08:36.952033 7f794f513e80  4 
rocksdb:    Options.info_log: 0x5617a24ce1c0
2018-01-12 13:08:36.952034 7f794f513e80  4 
rocksdb:  Options.max_open_files: -1
2018-01-12 13:08:36.952035 7f794f513e80  4 rocksdb: 
Options.max_file_opening_threads: 16
2018-01-12 13:08:36.952035 7f794f513e80  4 
rocksdb:   Options.use_fsync: 0
2018-01-12 13:08:36.952037 7f794f513e80  4 
rocksdb:   Options.max_log_file_size: 0
2018-01-12 13:08:36.952038 7f794f513e80  4 rocksdb:  
Options.max_manifest_file_size: 18446744073709551615
2018-01-12 13:08:36.952039 7f794f513e80  4 rocksdb:   
Options.log_file_time_to_roll: 0
2018-01-12 13:08:36.952040 7f794f513e80  4 
rocksdb:   Options.keep_log_file_num: 1000
2018-01-12 13:08:36.952041 7f794f513e80  4 rocksdb:    
Options.recycle_log_file_num: 0
2018-01-12 13:08:36.952042 7f794f513e80  4 
rocksdb: Options.allow_fallocate: 1
2018-01-12 13:08:36.952043 7f794f513e80  4 
rocksdb:    Options.allow_mmap_reads: 0
2018-01-12 13:08:36.952044 7f794f513e80  4 
rocksdb:   Options.allow_mmap_writes: 0
2018-01-12 13:08:36.952045 7f794f513e80  4 
rocksdb:    Options.use_direct_reads: 0
2018-01-12 13:08:36.952046 7f794f513e80  4 rocksdb: 
Options.use_direct_io_for_flush_and_compaction: 0
2018-01-12 13:08:36.952047 7f794f513e80  4 rocksdb: 
Options.create_missing_column_families: 0
2018-01-12 13:08:36.952048 7f794f513e80  4 
rocksdb:  Options.db_log_dir:
2018-01-12 13:08:36.952049 7f794f513e80  4 
rocksdb: Options.wal_dir: 
/var/lib/ceph/mon/ceph-ceph001/store.db
2018-01-12 13:08:36.952050 7f794f513e80  4 rocksdb: 
Options.table_cache_numshardbits: 6
2018-01-12 13:08:36.952050 7f794f513e80  4 rocksdb:  
Options.max_subcompactions: 1
2018-01-12 13:08:36.952062 

[ceph-users] force scrubbing

2017-11-13 Thread Kenneth Waegeman

Hi all,


Is there a way to force scrub a pg of an erasure coded pool?

I tried  ceph pg deep-scrub 5.4c7, but after a week it still hasn't 
scrubbed the pg (last scrub timestamp not changed)


Thanks!


Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph inconsistent pg missing ec object

2017-11-09 Thread Kenneth Waegeman

Hi Greg,

Thanks! This seems to have worked for at least 1 of 2 inconsistent pgs: 
The inconsistency disappeared after a new scrub. Still waiting for the 
result of the second pg. I tried to force deep-scrub with `ceph pg 
deep-scrub ` yesterday, but today the last deep scrub is still from 
a week ago. Is there a way to actually deep-scrub immediately?


Thanks again!

Kenneth


On 02/11/17 19:27, Gregory Farnum wrote:
Okay, after consulting with a colleague this appears to be an instance 
of http://tracker.ceph.com/issues/21382. Assuming the object is one 
that doesn't have snapshots, your easiest resolution is to use rados 
get to retrieve the object (which, unlike recovery, should work) and 
then "rados put" it back in to place.


This fix might be backported to Jewel for a later release, but it's 
tricky so wasn't done proactively.

-Greg

On Fri, Oct 20, 2017 at 12:27 AM Stijn De Weirdt 
> wrote:


hi gregory,

we more or less followed the instructions on the site (famous last
words, i know ;)

grepping for the error in the osd logs of the osds of the pg, the
primary logs had "5.5e3s0 shard 59(5) missing
5:c7ae919b:::10014d3184b.:head"

we looked for the object using the find command, we got

> [root@osd003 ~]# find
/var/lib/ceph/osd/ceph-35/current/5.5e3s0_head/ -name
"*10014d3184b.*"
>
>

/var/lib/ceph/osd/ceph-35/current/5.5e3s0_head/DIR_3/DIR_E/DIR_5/DIR_7/DIR_9/10014d3184b.__head_D98975E3__5__0

then we ran this find on all 11 osds from the pg, and 10 out of 11
osds
gave similar path (the suffix _[0-9a] matched the index of the osd in
the list of osds reported by the pg, so i assumed that was the ec
splitting up the data in 11 pieces)

on one osd in the list of osds, there was no such object (the 6th one,
index 5, so more assuming form our side that this was the 5 in 5:...
from the logfile). so we assumed this was the missing object that the
error reported. we have absolutely no clue why it was missing or what
happened, nothing in any logs.

what we did then was stop the osd that had the missing object,
flush the
journal and start the osd and ran repair. (the guide mentioned to
delete
an object, we did not delete anything, because we assumed the
issue was
the already missing object from the 6th osd)

flushing the journal segfaulted, but the osd started fine again.

the scrub errors did not disappear, so we did the same again on the
primary (no deleting of anything; and again, the flush segfaulted).

wrt the segfault, i attached the output of a segfaulting flush with
debug on another osd.


stijn


On 10/20/2017 02:56 AM, Gregory Farnum wrote:
> Okay, you're going to need to explain in very clear terms
exactly what
> happened to your cluster, and *exactly* what operations you
performed
> manually.
>
> The PG shards seem to have different views of the PG in
question. The
> primary has a different log_tail, last_user_version, and
last_epoch_clean
> from the others. Plus different log sizes? It's not making a ton
of sense
> at first glance.
> -Greg
>
> On Thu, Oct 19, 2017 at 1:08 AM Stijn De Weirdt
>
> wrote:
>
>> hi greg,
>>
>> i attached the gzip output of the query and some more info
below. if you
>> need more, let me know.
>>
>> stijn
>>
>>> [root@mds01 ~]# ceph -s
>>>     cluster 92beef0a-1239-4000-bacf-4453ab630e47
>>>      health HEALTH_ERR
>>>             1 pgs inconsistent
>>>             40 requests are blocked > 512 sec
>>>             1 scrub errors
>>>             mds0: Behind on trimming (2793/30)
>>>      monmap e1: 3 mons at {mds01=
>> 1.2.3.4:6789/0,mds02=1.2.3.5:6789/0,mds03=1.2.3.6:6789/0
}
>>>             election epoch 326, quorum 0,1,2 mds01,mds02,mds03
>>>       fsmap e238677: 1/1/1 up {0=mds02=up:active}, 2 up:standby
>>>      osdmap e79554: 156 osds: 156 up, 156 in
>>>             flags sortbitwise,require_jewel_osds
>>>       pgmap v51003893: 4096 pgs, 3 pools, 387 TB data, 243
Mobjects
>>>             545 TB used, 329 TB / 874 TB avail
>>>                 4091 active+clean
>>>                    4 active+clean+scrubbing+deep
>>>                    1 active+clean+inconsistent
>>>   client io 284 kB/s rd, 146 MB/s wr, 145 op/s rd, 177 op/s wr
>>>   cache io 115 MB/s flush, 153 MB/s evict, 14 op/s promote, 3
PG(s)
>> flushing
>>
>>> [root@mds01 ~]# ceph health detail
>>> HEALTH_ERR 1 pgs inconsistent; 52 requests are blocked > 512
sec; 5 osds
>> have slow requests; 1 scrub errors; mds0: Behind 

[ceph-users] inconsistent pg on erasure coded pool

2017-10-04 Thread Kenneth Waegeman

Hi,

We have some inconsistency / scrub error on a Erasure coded pool, that I 
can't seem to solve.


[root@osd008 ~]# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
pg 5.144 is active+clean+inconsistent, acting 
[81,119,148,115,142,100,25,63,48,11,43]

1 scrub errors

In the log files, it seems there is 1 missing shard:

/var/log/ceph/ceph-osd.81.log.2.gz:2017-10-02 23:49:11.940624 
7f0a9d7e2700 -1 log_channel(cluster) log [ERR] : 5.144s0 shard 63(7) 
missing 5:2297a2e1:::10014e2d8d5.:head
/var/log/ceph/ceph-osd.81.log.2.gz:2017-10-03 00:48:06.681941 
7f0a9d7e2700 -1 log_channel(cluster) log [ERR] : 5.144s0 deep-scrub 1 
missing, 0 inconsistent objects
/var/log/ceph/ceph-osd.81.log.2.gz:2017-10-03 00:48:06.681947 
7f0a9d7e2700 -1 log_channel(cluster) log [ERR] : 5.144 deep-scrub 1 errors


I tried running ceph pg repair on the pg, but nothing changed. I also 
tried starting a new deep-scrub on the  osd 81 (ceph osd deep-scrub 81) 
but I don't see any deep-scrub starting at the osd.


How can we solve this ?

Thank you!


Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] bluestore-osd and block.dbs of other osds on ssd

2017-07-26 Thread Kenneth Waegeman

Hi all,

Using filestore, we have some clusters were we put some journals of 
regular osds(hdd) together with eg. cache or metadata osd on one SSD. 
Even with the OS too on the OSD, this gave us better performance than 
with journals on disk.


Now using bluestore, i was thinking if it is possible to have a 
bluestore OSD on an SSD, together with the block.db/block.wal of HDD 
osds ? Something like this for the SSD:


Number  Start   EndSizeFile system  NameFlags
 1  1049kB  106MB  105MB   xfs  ceph data
 2  106MB   150GB  150GBceph block
 3  150GB   151GB  1074MB   ceph block.db
 4  151GB   152GB  604MBceph block.wal

Using ceph-deploy/ceph-disk , this does not seem possible at the moment. 
Adding the db/wal partitions is not a problem, but having the OSD to 
share the disk is:


- ceph-disk does not accept partitions, it needs full disks to make the 
xfs and block partitions


- it always needs to have the first two partitions

- it will take all the space left of the disk for the OSD block partition.


I probably could hack something in by resizing the partitions, like 
above, but I'd rather not :)


Will such kind of feature be possible, or is this just a bad idea with 
bluestore?



Thank you very much!

Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-disk --osd-id param

2017-07-25 Thread Kenneth Waegeman

I',m on 12.1.1 : ceph -v
ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc)

It's not in the 12.1.1 tag src code 
(https://github.com/ceph/ceph/blob/v12.1.1/src/ceph-disk/ceph_disk/main.py)



On 25/07/17 15:43, Edward R Huyer wrote:

Are you on 12.1.0 or 12.1.1?  I noticed that in 12.1.0 the ceph command was 
missing options that were supposed to be there, but 12.1.1 had them.  Maybe 
you're seeing a similar issue?

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Kenneth Waegeman
Sent: Tuesday, July 25, 2017 7:15 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] ceph-disk --osd-id param

Hi all,

  From the release notes of the Luminous RC, I read:

'There is a simplified OSD replacement process that is more robust.' , linked to

http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd

If I try running 'ceph-disk prepare --bluestore /dev/sdX  --osd-id {id} 
--osd-uuid `uuidgen`' I get 'ceph-disk: error: unrecognized arguments:
--osd-id 6'

I checked the source code and indeed, the param is in master, but not in the rc 
release tag.

Will this option still be added for Luminous?

Thanks!!

Kenneth



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-disk --osd-id param

2017-07-25 Thread Kenneth Waegeman

Hi all,

From the release notes of the Luminous RC, I read:

'There is a simplified OSD replacement process that is more robust.' , 
linked to


http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd

If I try running 'ceph-disk prepare --bluestore /dev/sdX  --osd-id {id} 
--osd-uuid `uuidgen`' I get 'ceph-disk: error: unrecognized arguments: 
--osd-id 6'


I checked the source code and indeed, the param is in master, but not in 
the rc release tag.


Will this option still be added for Luminous?

Thanks!!

Kenneth



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPHFS file or directories disappear when ls (metadata problem)

2016-09-29 Thread Kenneth Waegeman



On 29/09/16 14:29, Yan, Zheng wrote:

On Thu, Sep 29, 2016 at 8:13 PM, Kenneth Waegeman
<kenneth.waege...@ugent.be> wrote:

Hi all,

Following up on this thread:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008537.html

we still see files missing when doing ls on cephfs with
3.10.0-327.18.2.el7.ug.x86_64

Is there already a solution for this?I don't see anything ceph related
popping up in the release notes of the newer kernels..


try updating your kernel. The newest fixes are included in kernel-3.10.0-448.el7
Thanks!! We are running centos 7.2.. Is there a way to get the 
3.10.0-448.el7 kernel yet?


K


Regards
Yan, Zheng



Thanks !!

Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CEPHFS file or directories disappear when ls (metadata problem)

2016-09-29 Thread Kenneth Waegeman

Hi all,

Following up on this thread:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008537.html

we still see files missing when doing ls on cephfs with 
3.10.0-327.18.2.el7.ug.x86_64


Is there already a solution for this?I don't see anything ceph related 
popping up in the release notes of the newer kernels..


Thanks !!

Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to debug pg inconsistent state - no ioerrors seen

2016-08-09 Thread Kenneth Waegeman

Hi,

I did a diff on the directories of all three the osds, no difference .. 
So I don't know what's wrong.


Only thing I see different is a scrub file in the TEMP folder (it is 
already another pg than last mail):


-rw-r--r--1 ceph ceph 0 Aug  9 09:51 
scrub\u6.107__head_0107__fff8


But it is empty..

Thanks!


On 09/08/16 04:33, Goncalo Borges wrote:

Hi Kenneth...

The previous default behavior of 'ceph pg repair' was to copy the pg 
objects from the primary osd to others. Not sure if it is till the 
case in Jewel. For this reason, once we get these kind of errors in a 
data pool, the best practice is to compare the md5 checksums of the 
damaged object in all osds involved in the inconsistent pg. Since we 
have a 3 replica cluster, we should find a 2 good object quorum. If by 
chance the primary osd has the wrong object, it should delete it 
before running  the repair.


On a metadata pool, I am not sure exactly how to cross check since all 
objects are size 0 and therefore, md5sum is meaningless. Maybe, one 
way forward could be to check the contents of the pg directories (ex: 
/var/lib/ceph/osd/ceph-0/current/5.161_head/) in all osds involved for 
the pg and see if we spot something wrong?


Cheers

G.


On 08/08/2016 09:40 PM, Kenneth Waegeman wrote:

Hi all,

Since last week, some pg's are going in the inconsistent state after 
a scrub error. Last week we had 4 pgs in that state, They were on 
different OSDS, but all of the metadata pool.
I did a pg repair on them, and all were healthy again. But now again 
one pg is inconsistent.


with health detail I see:

pg 6.2f4 is active+clean+inconsistent, acting [3,5,1]
1 scrub errors

And in the log of the primary:

2016-08-06 06:24:44.723224 7fc5493f3700 -1 log_channel(cluster) log 
[ERR] : 6.2f4 shard 5: soid 6:2f55791f:::606.:head 
omap_digest 0x3a105358 != best guess omap_digest 0xc85c4361 from auth 
shard 1
2016-08-06 06:24:53.931029 7fc54bbf8700 -1 log_channel(cluster) log 
[ERR] : 6.2f4 deep-scrub 0 missing, 1 inconsistent objects
2016-08-06 06:24:53.931055 7fc54bbf8700 -1 log_channel(cluster) log 
[ERR] : 6.2f4 deep-scrub 1 errors


I looked in dmesg but I couldn't see any IO errors on any of the OSDs 
in the acting set.  Last week it was another set. It is of course 
possible more than 1 OSD is failing, but how can we check this, since 
there is nothing more in the logs?


Thanks !!

K
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to debug pg inconsistent state - no ioerrors seen

2016-08-08 Thread Kenneth Waegeman

Hi all,

Since last week, some pg's are going in the inconsistent state after a 
scrub error. Last week we had 4 pgs in that state, They were on 
different OSDS, but all of the metadata pool.
I did a pg repair on them, and all were healthy again. But now again one 
pg is inconsistent.


with health detail I see:

pg 6.2f4 is active+clean+inconsistent, acting [3,5,1]
1 scrub errors

And in the log of the primary:

2016-08-06 06:24:44.723224 7fc5493f3700 -1 log_channel(cluster) log 
[ERR] : 6.2f4 shard 5: soid 6:2f55791f:::606.:head omap_digest 
0x3a105358 != best guess omap_digest 0xc85c4361 from auth shard 1
2016-08-06 06:24:53.931029 7fc54bbf8700 -1 log_channel(cluster) log 
[ERR] : 6.2f4 deep-scrub 0 missing, 1 inconsistent objects
2016-08-06 06:24:53.931055 7fc54bbf8700 -1 log_channel(cluster) log 
[ERR] : 6.2f4 deep-scrub 1 errors


I looked in dmesg but I couldn't see any IO errors on any of the OSDs in 
the acting set.  Last week it was another set. It is of course possible 
more than 1 OSD is failing, but how can we check this, since there is 
nothing more in the logs?


Thanks !!

K
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD host swap usage

2016-07-27 Thread Kenneth Waegeman



On 27/07/16 10:59, Christian Balzer wrote:

Hello,

On Wed, 27 Jul 2016 10:21:34 +0200 Kenneth Waegeman wrote:


Hi all,

When our OSD hosts are running for some time, we start see increased
usage of swap on a number of them. Some OSDs don't use swap for weeks,
while others has a full (4G) swap, and start filling swap again after we
did a swapoff/swapon.

Obvious first question would be, are all these hosts really the same, HW,
SW and configuration wise?
They have the same hardware, are configured the same through config mgt 
with ceph 10.2.2 and kernel 3.10.0-327.18.2.el7.ug.x86_64



We have 8 8TB OSDS and 2 cache SSDs on each hosts, and 80GB of Memory.

How full are these OSDs?
I'm interested in # of files, not space, so a "df -i" should give us some idea.


Filesystem   InodesIUsed IFree IUse% 
Mounted on
/dev/sdm7  1983232050068 197822521% 
/var/lib/ceph/osd/cache/sdm
/dev/md124194557760 19620569 174937191 11% 
/var/lib/ceph/osd/sdk0sdl
/dev/md117194557760 20377826 174179934 11% 
/var/lib/ceph/osd/sdc0sdd
/dev/md127194557760 21453957 173103803 12% 
/var/lib/ceph/osd/sda0sdb
/dev/md121194557760 20270844 174286916 11% 
/var/lib/ceph/osd/sdq0sdr
/dev/md118194557760 20476860 174080900 11% 
/var/lib/ceph/osd/sde0sdf
/dev/md120194557760 19939165 174618595 11% 
/var/lib/ceph/osd/sdo0sdp
/dev/md113194557760 22098382 172459378 12% 
/var/lib/ceph/osd/sdg0sdh
/dev/md112194557760 18209988 176347772 10% 
/var/lib/ceph/osd/sdi0sdj
/dev/sdn7  1993062447087 198835371% 
/var/lib/ceph/osd/cache/sdn





80GB is an odd number, how are the DIMMs distributed among the CPU(s)?

Only 1 socket:

Machine (79GB)
  Socket L#0 + L3 L#0 (20MB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
  PU L#0 (P#0)
  PU L#1 (P#8)
L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
  PU L#2 (P#1)
  PU L#3 (P#9)
L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
  PU L#4 (P#2)
  PU L#5 (P#10)
L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
  PU L#6 (P#3)
  PU L#7 (P#11)
L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
  PU L#8 (P#4)
  PU L#9 (P#12)
L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
  PU L#10 (P#5)
  PU L#11 (P#13)
L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
  PU L#12 (P#6)
  PU L#13 (P#14)
L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7
  PU L#14 (P#7)
  PU L#15 (P#15)

3 dimms of 16GB + 1 dimm of 8 in first set of DIMMS, 3 dimms of 8 in 
second set (as in our vendor's manual)





There is still about 15-20GB memory available when this happens. Running
Centos7;

How do you define free memory?
Not used at all?
I'd expect any Ceph storage server to use all "free" RAM for SLAB and
pagecache very quickly, at the latest after the first deep scrub.
%Cpu(s):  5.3 us,  0.1 sy,  0.0 ni, 94.1 id,  0.5 wa,  0.0 hi,  0.0 si,  
0.0 st

KiB Mem : 82375104 total,  7037032 free, 41117768 used, 34220308 buff/cache
KiB Swap:  4194300 total,  3666416 free,   527884 used. 15115612 avail Mem

PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
3979408 ceph  20   0 4115960 1.079g   5912 S  85.1  1.4 7174:16 
ceph-osd
3979417 ceph  20   0 3843488 967424   6076 S   1.7  1.2 7114:34 
ceph-osd
3979410 ceph  20   0 4089372 1.085g   5964 S   1.3  1.4 9072:56 
ceph-osd
3979419 ceph  20   0 4345000 1.116g   6168 S   1.3  1.4 9151:36 
ceph-osd


If it is really unused AND your system is swapping, something odd is going
on indeed, maybe something NUMA related that prevents part of your memory
from being used.

Of course this could also be an issue with your CentOS kernel, I'm
definitely not seeing anything like this on any of my machines.


We had swapiness set to 0.

I wouldn't set it lower than 1.
Also any other tuning settings, like vm/vfs_cache_pressure and
vm/min_free_kbytes?


vfs_cache_pressure is on the default 100,
vm.min_free_kbytes=3145728

other tuned settings:

fs.file-max=262144
kernel.msgmax=65536
kernel.msgmnb=65536
kernel.msgmni=1024
kernel.pid_max=4194303
kernel.sem=250 32000 100 1024
kernel.shmall=20971520
kernel.shmmax=34359738368
kernel.shmmni=16384
net.core.netdev_max_backlog=25
net.core.rmem_default=262144
net.core.rmem_max=4194304
net.core.somaxconn=1024
net.core.wmem_default=262144
net.core.wmem_max=4194304
net.ipv4.conf.all.arp_filter=1
net.ipv4.ip_local_port_range=32768 61000
net.ipv4.neigh.default.base_reachable_time=14400
net.ipv4.neigh.default.gc_interval=14400
net.ipv4.neigh.default.gc_stale_time=14400
net.ipv4.neigh.default.gc_t

[ceph-users] OSD host swap usage

2016-07-27 Thread Kenneth Waegeman

Hi all,

When our OSD hosts are running for some time, we start see increased 
usage of swap on a number of them. Some OSDs don't use swap for weeks, 
while others has a full (4G) swap, and start filling swap again after we 
did a swapoff/swapon.
We have 8 8TB OSDS and 2 cache SSDs on each hosts, and 80GB of Memory. 
There is still about 15-20GB memory available when this happens. Running 
Centos7; We had swapiness set to 0. There is no client io right now, 
only scrubbing. some OSDs are using 20-80% of cpu.


Has somebody seen this behaviour? It doesn't have to be bad, but what 
could explain some hosts keep on swapping, and others don't?

Could this be some issue?

Thanks !!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds0: Behind on trimming (58621/30)

2016-07-05 Thread Kenneth Waegeman



On 04/07/16 11:22, Kenneth Waegeman wrote:



On 01/07/16 16:01, Yan, Zheng wrote:

On Fri, Jul 1, 2016 at 6:59 PM, John Spray <jsp...@redhat.com> wrote:

On Fri, Jul 1, 2016 at 11:35 AM, Kenneth Waegeman
<kenneth.waege...@ugent.be> wrote:

Hi all,

While syncing a lot of files to cephfs, our mds cluster got 
haywire: the

mdss have a lot of segments behind on trimming:  (58621/30)
Because of this the mds cluster gets degraded. RAM usage is about 
50GB. The
mdses were respawning and replaying continiously, and I had to stop 
all

syncs , unmount all clients and increase the beacon_grace to keep the
cluster up .

[root@mds03 ~]# ceph status
 cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
  health HEALTH_WARN
 mds0: Behind on trimming (58621/30)
  monmap e1: 3 mons at
{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0} 


 election epoch 170, quorum 0,1,2 mds01,mds02,mds03
   fsmap e78658: 1/1/1 up {0=mds03=up:active}, 2 up:standby
  osdmap e19966: 156 osds: 156 up, 156 in
 flags sortbitwise
   pgmap v10213164: 4160 pgs, 4 pools, 253 TB data, 203 Mobjects
 357 TB used, 516 TB / 874 TB avail
 4151 active+clean
5 active+clean+scrubbing
4 active+clean+scrubbing+deep
   client io 0 B/s rd, 0 B/s wr, 63 op/s rd, 844 op/s wr
   cache io 68 op/s promote


Now it finally is up again, it is trimming very slowly (+-120 
segments /

min)

Hmm, so it sounds like something was wrong that got cleared by either
the MDS restart or the client unmount, and now it's trimming at a
healthier rate.

What client (kernel or fuse, and version)?

Can you confirm that the RADOS cluster itself was handling operations
reasonably quickly?  Is your metadata pool using the same drives as
your data?  Were the OSDs saturated with IO?

While the cluster was accumulating untrimmed segments, did you also
have a "client xyz failing to advanced oldest_tid" warning?

This does not prevent MDS from trimming log segment.


It would be good to clarify whether the MDS was trimming slowly, or
not at all.  If you can reproduce this situation, get it to a "behind
on trimming" state, and the stop the client IO (but leave it mounted).
See if the (x/30) number stays the same.  Then, does it start to
decrease when you unmount the client?  That would indicate a
misbehaving client.

Behind on trimming on single MDS cluster should be caused by either
slow rados operations or MDS trim too few log segments on each tick.

Kenneth, could you try setting mds_log_max_expiring to a large value
(such as 200)
I've set the mds_log_max_expiring to 200 right now. Should I see 
something instantly?
The trimming finished rather quick, although I don't have any accurate 
time measures. Cluster looks running fine right now, but running 
incremental sync. We will try with same data again to see if it is ok now.
Is this mds_log_max_expiring option production ready ? (Don't seem to 
find it in documentation)


Thank you!!

K


This weekend , the trimming did not contunue and something happened to 
the cluster:


mds.0.cache.dir(1000da74e85) commit error -2 v 2466977
log_channel(cluster) log [ERR] : failed to commit dir 1000da74e85 
object, errno -2
mds.0.78429 unhandled write error (2) No such file or directory, force 
readonly...

mds.0.cache force file system read-only
log_channel(cluster) log [WRN] : force file system read-only

and ceph health reported:
mds0: MDS in read-only mode

I restarted it and it is trimming again.


Thanks again!
Kenneth

Regards
Yan, Zheng


John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds0: Behind on trimming (58621/30)

2016-07-04 Thread Kenneth Waegeman



On 01/07/16 16:01, Yan, Zheng wrote:

On Fri, Jul 1, 2016 at 6:59 PM, John Spray <jsp...@redhat.com> wrote:

On Fri, Jul 1, 2016 at 11:35 AM, Kenneth Waegeman
<kenneth.waege...@ugent.be> wrote:

Hi all,

While syncing a lot of files to cephfs, our mds cluster got haywire: the
mdss have a lot of segments behind on trimming:  (58621/30)
Because of this the mds cluster gets degraded. RAM usage is about 50GB. The
mdses were respawning and replaying continiously, and I had to stop all
syncs , unmount all clients and increase the beacon_grace to keep the
cluster up .

[root@mds03 ~]# ceph status
 cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
  health HEALTH_WARN
 mds0: Behind on trimming (58621/30)
  monmap e1: 3 mons at
{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
 election epoch 170, quorum 0,1,2 mds01,mds02,mds03
   fsmap e78658: 1/1/1 up {0=mds03=up:active}, 2 up:standby
  osdmap e19966: 156 osds: 156 up, 156 in
 flags sortbitwise
   pgmap v10213164: 4160 pgs, 4 pools, 253 TB data, 203 Mobjects
 357 TB used, 516 TB / 874 TB avail
 4151 active+clean
5 active+clean+scrubbing
4 active+clean+scrubbing+deep
   client io 0 B/s rd, 0 B/s wr, 63 op/s rd, 844 op/s wr
   cache io 68 op/s promote


Now it finally is up again, it is trimming very slowly (+-120 segments /
min)

Hmm, so it sounds like something was wrong that got cleared by either
the MDS restart or the client unmount, and now it's trimming at a
healthier rate.

What client (kernel or fuse, and version)?

Can you confirm that the RADOS cluster itself was handling operations
reasonably quickly?  Is your metadata pool using the same drives as
your data?  Were the OSDs saturated with IO?

While the cluster was accumulating untrimmed segments, did you also
have a "client xyz failing to advanced oldest_tid" warning?

This does not prevent MDS from trimming log segment.


It would be good to clarify whether the MDS was trimming slowly, or
not at all.  If you can reproduce this situation, get it to a "behind
on trimming" state, and the stop the client IO (but leave it mounted).
See if the (x/30) number stays the same.  Then, does it start to
decrease when you unmount the client?  That would indicate a
misbehaving client.

Behind on trimming on single MDS cluster should be caused by either
slow rados operations or MDS trim too few log segments on each tick.

Kenneth, could you try setting mds_log_max_expiring to a large value
(such as 200)
I've set the mds_log_max_expiring to 200 right now. Should I see 
something instantly?


This weekend , the trimming did not contunue and something happened to 
the cluster:


mds.0.cache.dir(1000da74e85) commit error -2 v 2466977
log_channel(cluster) log [ERR] : failed to commit dir 1000da74e85 
object, errno -2
mds.0.78429 unhandled write error (2) No such file or directory, force 
readonly...

mds.0.cache force file system read-only
log_channel(cluster) log [WRN] : force file system read-only

and ceph health reported:
mds0: MDS in read-only mode

I restarted it and it is trimming again.


Thanks again!
Kenneth

Regards
Yan, Zheng


John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds0: Behind on trimming (58621/30)

2016-07-01 Thread Kenneth Waegeman



On 01/07/16 12:59, John Spray wrote:

On Fri, Jul 1, 2016 at 11:35 AM, Kenneth Waegeman
<kenneth.waege...@ugent.be> wrote:

Hi all,

While syncing a lot of files to cephfs, our mds cluster got haywire: the
mdss have a lot of segments behind on trimming:  (58621/30)
Because of this the mds cluster gets degraded. RAM usage is about 50GB. The
mdses were respawning and replaying continiously, and I had to stop all
syncs , unmount all clients and increase the beacon_grace to keep the
cluster up .

[root@mds03 ~]# ceph status
 cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
  health HEALTH_WARN
 mds0: Behind on trimming (58621/30)
  monmap e1: 3 mons at
{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
 election epoch 170, quorum 0,1,2 mds01,mds02,mds03
   fsmap e78658: 1/1/1 up {0=mds03=up:active}, 2 up:standby
  osdmap e19966: 156 osds: 156 up, 156 in
 flags sortbitwise
   pgmap v10213164: 4160 pgs, 4 pools, 253 TB data, 203 Mobjects
 357 TB used, 516 TB / 874 TB avail
 4151 active+clean
5 active+clean+scrubbing
4 active+clean+scrubbing+deep
   client io 0 B/s rd, 0 B/s wr, 63 op/s rd, 844 op/s wr
   cache io 68 op/s promote


Now it finally is up again, it is trimming very slowly (+-120 segments /
min)

Hmm, so it sounds like something was wrong that got cleared by either
the MDS restart or the client unmount, and now it's trimming at a
healthier rate.

What client (kernel or fuse, and version)?

kernel client of centos 7.2, 3.10.0-327.18.2.el7


Can you confirm that the RADOS cluster itself was handling operations
reasonably quickly?  Is your metadata pool using the same drives as
your data?  Were the OSDs saturated with IO?
Metadata pool is a pool of SSDS. Data is ecpool with a cache layer of 
seperate ssds. There was indeed load on the OSDS, and the ceph health 
command produced regularly Cache at/near full ratio warnings too




While the cluster was accumulating untrimmed segments, did you also
have a "client xyz failing to advanced oldest_tid" warning?

We did not see that warning.


It would be good to clarify whether the MDS was trimming slowly, or
not at all.  If you can reproduce this situation, get it to a "behind
on trimming" state, and the stop the client IO (but leave it mounted).
See if the (x/30) number stays the same.  Then, does it start to
decrease when you unmount the client?  That would indicate a
misbehaving client.
mds trimming still at (37927/30), so have to wait some more hours before 
i can try to reproduce it. (Nothing can be done to speed this up?)
There was a moment were the mds was active and I didn't saw the segments 
going down.. I did ran ceph daemon mds.mds03 flush journal. But this was 
before i changed the beacon_grace so it respawned again at that moment, 
so I'm not quite sure if there was another issue then.


Thanks again!

Kenneth


John


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mds0: Behind on trimming (58621/30)

2016-07-01 Thread Kenneth Waegeman

Hi all,

While syncing a lot of files to cephfs, our mds cluster got haywire: the 
mdss have a lot of segments behind on trimming:  (58621/30)
Because of this the mds cluster gets degraded. RAM usage is about 50GB. 
The mdses were respawning and replaying continiously, and I had to stop 
all syncs , unmount all clients and increase the beacon_grace to keep 
the cluster up .


[root@mds03 ~]# ceph status
cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
 health HEALTH_WARN
mds0: Behind on trimming (58621/30)
 monmap e1: 3 mons at 
{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}

election epoch 170, quorum 0,1,2 mds01,mds02,mds03
  fsmap e78658: 1/1/1 up {0=mds03=up:active}, 2 up:standby
 osdmap e19966: 156 osds: 156 up, 156 in
flags sortbitwise
  pgmap v10213164: 4160 pgs, 4 pools, 253 TB data, 203 Mobjects
357 TB used, 516 TB / 874 TB avail
4151 active+clean
   5 active+clean+scrubbing
   4 active+clean+scrubbing+deep
  client io 0 B/s rd, 0 B/s wr, 63 op/s rd, 844 op/s wr
  cache io 68 op/s promote


Now it finally is up again, it is trimming very slowly (+-120 segments / 
min)

We've seen some 'behind on trimming' before, but never that much..
So now our production cluster is unusable for approx half a day..

What could be the problem here? We are running 10.2.1
Can something be done to not let the mds keep that much segments ?
Can we fasten the trimming process?

Thanks you very much!

Cheers,
Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs snapshots

2016-06-22 Thread Kenneth Waegeman

Hi all,

In Jewel ceph fs snapshots are still experimental. Does someone has a 
clue when this would become stable, or how experimental this is ?



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs rm -rf on directory of 160TB /40M files

2016-04-05 Thread Kenneth Waegeman

Thanks!
Now this slow rm is not our biggest issue anymore (for the moment).. 
Since this night all our MDSs did crash.
I opened a ticket  http://tracker.ceph.com/issues/15379 with the 
stacktraces I got.

We aren't able to restart the mds's at all for now..
I stopped the rm, and also increased the mds_beacon_grace because it was 
first complaining about timeouts.

Without luck, still not able to keep the mds's up longer than 5minutes..
How can we solve this ?

Thanks again!

Kenneth

On 05/04/16 04:19, Yan, Zheng wrote:

On Tue, Apr 5, 2016 at 12:55 AM, Gregory Farnum <gfar...@redhat.com> wrote:

Deletes are just slow right now. You can look at the ops in flight on you
client or MDS admin socket to see how far along it is and watch them to see
how long stuff is taking -- I think it's a sync disk commit for each unlink
though so at 40M it's going to be a good looong while. :/
-Greg


On Monday, April 4, 2016, Kenneth Waegeman <kenneth.waege...@ugent.be>
wrote:

Hi all,

I want to remove a large directory containing +- 40M files /160TB of data
in CephFS by running rm -rf on the directory via the ceph kernel client.
After 7h , the rm command is still running. I checked the rados df output,
and saw that only about  2TB and 2M files are gone.
I know this output of rados df can be confusing because ceph should delete
objects asyncroniously, but then I don't know why the rm command still
hangs.
Is there some way to speed this up? And is there a way to check how far
the marked for deletion has progressed ?

Check /sys/kernel/debug/ceph/xxx/mdsc, you can roughly get how fast
unlink requests are handled.  If MDS's CPU usage is less than 100%.
you can try running multiple instance of 'rm -rf' (each one removes
different sub-directory)


Regards
Yan, Zheng


Thank you very much!

Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs rm -rf on directory of 160TB /40M files

2016-04-04 Thread Kenneth Waegeman

Hi all,

I want to remove a large directory containing +- 40M files /160TB of 
data in CephFS by running rm -rf on the directory via the ceph kernel 
client.
After 7h , the rm command is still running. I checked the rados df 
output, and saw that only about  2TB and 2M files are gone.
I know this output of rados df can be confusing because ceph should 
delete objects asyncroniously, but then I don't know why the rm command 
still hangs.
Is there some way to speed this up? And is there a way to check how far 
the marked for deletion has progressed ?


Thank you very much!

Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] DONTNEED fadvise flag

2016-03-21 Thread Kenneth Waegeman
Thanks! As we are using the kernel client of EL7, does someone knows if 
that client supports it?


On 16/03/16 20:29, Gregory Farnum wrote:

On Wed, Mar 16, 2016 at 9:46 AM, Kenneth Waegeman
<kenneth.waege...@ugent.be> wrote:

Hi all,

Quick question: Does cephFS pass the fadvise DONTNEED flag and take it into
account?
I want to use the --drop-cache option of rsync 3.1.1 to not fill the cache
when rsyncing to cephFS

It looks like ceph-fuse unfortunately does not. I'm not sure about the
kernel client though.
-Greg


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] DONTNEED fadvise flag

2016-03-20 Thread Kenneth Waegeman

Hi all,

Quick question: Does cephFS pass the fadvise DONTNEED flag and take it 
into account?
I want to use the --drop-cache option of rsync 3.1.1 to not fill the 
cache when rsyncing to cephFS


Thanks!

Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 9.2.0 mds cluster went down and now constantly crashes with Floating point exception

2016-02-17 Thread Kenneth Waegeman



On 05/02/16 11:43, John Spray wrote:

On Fri, Feb 5, 2016 at 9:36 AM, Kenneth Waegeman
<kenneth.waege...@ugent.be> wrote:


On 04/02/16 16:17, Gregory Farnum wrote:

On Thu, Feb 4, 2016 at 1:42 AM, Kenneth Waegeman
<kenneth.waege...@ugent.be> wrote:

Hi,

Hi, we are running ceph 9.2.0.
Overnight, our ceph state went to 'mds mds03 is laggy' . When I checked
the
logs, I saw this mds crashed with a stacktrace. I checked the other mdss,
and I saw the same there.
When I try to start the mds again, I get again a stacktrace and it won't
come up:

   -12> 2016-02-04 10:23:46.837131 7ff9ea570700  1 --
10.141.16.2:6800/193767 <== osd.146 10.141.16.25:6800/7036 1 
osd_op_reply(207 15ef982. [stat] v0'0 uv22184 ondisk = 0) v6
 187+0+16 (113
2261152 0 506978568) 0x7ffa171ae940 con 0x7ffa189cc3c0
  -11> 2016-02-04 10:23:46.837317 7ff9ed6a1700  1 --
10.141.16.2:6800/193767 <== osd.136 10.141.16.24:6800/6764 6 
osd_op_reply(209 148aaac. [delete] v0'0 uv23797 ondisk = -2
((2)
No such file o
r directory)) v6  187+0+0 (64699207 0 0) 0x7ffa171acb00 con
0x7ffa014fd9c0
  -10> 2016-02-04 10:23:46.837406 7ff9ec994700  1 --
10.141.16.2:6800/193767 <== osd.36 10.141.16.14:6800/5395 5 
osd_op_reply(175 15f631f. [stat] v0'0 uv22466 ondisk = 0) v6
 187+0+16 (1037
61047 0 2527067705) 0x7ffa08363700 con 0x7ffa189ca580
   -9> 2016-02-04 10:23:46.837463 7ff9eba85700  1 --
10.141.16.2:6800/193767 <== osd.47 10.141.16.15:6802/7128 2 
osd_op_reply(211 148aac8. [delete] v0'0 uv22990 ondisk = -2
((2)
No such file or
directory)) v6  187+0+0 (1138385695 0 0) 0x7ffa01cd0dc0 con
0x7ffa189cadc0
   -8> 2016-02-04 10:23:46.837468 7ff9eb27d700  1 --
10.141.16.2:6800/193767 <== osd.16 10.141.16.12:6800/5739 2 
osd_op_reply(212 148aacd. [delete] v0'0 uv23991 ondisk = -2
((2)
No such file or
directory)) v6  187+0+0 (1675093742 0 0) 0x7ffa171ac840 con
0x7ffa189cb760
   -7> 2016-02-04 10:23:46.837477 7ff9eab76700  1 --
10.141.16.2:6800/193767 <== osd.66 10.141.16.17:6800/6353 2 
osd_op_reply(210 148aab9. [delete] v0'0 uv24583 ondisk = -2
((2)
No such file or
directory)) v6  187+0+0 (603192739 0 0) 0x7ffa19054680 con
0x7ffa189cbce0
   -6> 2016-02-04 10:23:46.838140 7ff9f0bcf700  1 --
10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 43 
osd_op_reply(121 200.9d96 [write 1459360~980] v943'4092 uv4092 ondisk
=
0) v6  179+0+0 (3939130488 0 0) 0x7ffa01590100 con 0x7ffa014fab00
   -5> 2016-02-04 10:23:46.838342 7ff9f0bcf700  1 --
10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 44 
osd_op_reply(124 200.9d96 [write 1460340~956] v943'4093 uv4093 ondisk
=
0) v6  179+0+0 (1434265886 0 0) 0x7ffa01590100 con 0x7ffa014fab00
   -4> 2016-02-04 10:23:46.838531 7ff9f0bcf700  1 --
10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 45 
osd_op_reply(126 200.9d96 [write 1461296~954] v943'4094 uv4094 ondisk
=
0) v6  179+0+0 (25292940 0 0) 0x7ffa01590100 con 0x7ffa014fab00
   -3> 2016-02-04 10:23:46.838700 7ff9ecd98700  1 --
10.141.16.2:6800/193767 <== osd.57 10.141.16.16:6802/7067 3 
osd_op_reply(199 15ef976. [stat] v0'0 uv22557 ondisk = 0) v6
 187+0+16 (354652996 0 2244692791) 0x7ffa171ade40 con 0x7ffa189ca160
   -2> 2016-02-04 10:23:46.839301 7ff9ed8a3700  1 --
10.141.16.2:6800/193767 <== osd.107 10.141.16.21:6802/7468 3 
osd_op_reply(115 1625476. [stat] v0'0 uv22587 ondisk = 0) v6
 187+0+16 (664308076 0 998461731) 0x7ffa08363c80 con 0x7ffa014fdb20
   -1> 2016-02-04 10:23:46.839322 7ff9f0bcf700  1 --
10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 46 
osd_op_reply(128 200.9d96 [write 1462250~954] v943'4095 uv4095 ondisk
=
0) v6  179+0+0 (1379768629 0 0) 0x7ffa01590100 con 0x7ffa014fab00
0> 2016-02-04 10:23:46.839379 7ff9f30d8700 -1 *** Caught signal
(Floating point exception) **
in thread 7ff9f30d8700

ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
1: (()+0x4b6fa2) [0x7ff9fd091fa2]
2: (()+0xf100) [0x7ff9fbfd3100]
3: (StrayManager::_calculate_ops_required(CInode*, bool)+0xa2)
[0x7ff9fcf0adc2]
4: (StrayManager::enqueue(CDentry*, bool)+0x169) [0x7ff9fcf10459]
5: (StrayManager::__eval_stray(CDentry*, bool)+0xa49) [0x7ff9fcf111c9]
6: (StrayManager::eval_stray(CDentry*, bool)+0x1e) [0x7ff9fcf113ce]
7: (MDCache::scan_stray_dir(dirfrag_t)+0x13d) [0x7ff9fce6741d]
8: (MDSInternalContextBase::complete(int)+0x1e3) [0x7ff9fcff4993]
9: (MDSRank::_advance_queues()+0x382) [0x7ff9fcdd4652]
10: (MDSRank::ProgressThread::entry()+0x4a) [0x7ff9fcdd4aca]
11: (()+0x7dc5) [0x7ff9fbfcbdc5]
12: (clone()+0x6d) [0x7ff9faeb621d]

Does someone has an idea? We can't use our fs right now..

Hey, fun! Just looking for FPE opportunities in that fun

Re: [ceph-users] ceph 9.2.0 mds cluster went down and now constantly crashes with Floating point exception

2016-02-05 Thread Kenneth Waegeman
pray wrote:

On Fri, Feb 5, 2016 at 9:36 AM, Kenneth Waegeman
<kenneth.waege...@ugent.be> wrote:


On 04/02/16 16:17, Gregory Farnum wrote:

On Thu, Feb 4, 2016 at 1:42 AM, Kenneth Waegeman
<kenneth.waege...@ugent.be> wrote:

Hi,

Hi, we are running ceph 9.2.0.
Overnight, our ceph state went to 'mds mds03 is laggy' . When I checked
the
logs, I saw this mds crashed with a stacktrace. I checked the other mdss,
and I saw the same there.
When I try to start the mds again, I get again a stacktrace and it won't
come up:

   -12> 2016-02-04 10:23:46.837131 7ff9ea570700  1 --
10.141.16.2:6800/193767 <== osd.146 10.141.16.25:6800/7036 1 
osd_op_reply(207 15ef982. [stat] v0'0 uv22184 ondisk = 0) v6
 187+0+16 (113
2261152 0 506978568) 0x7ffa171ae940 con 0x7ffa189cc3c0
  -11> 2016-02-04 10:23:46.837317 7ff9ed6a1700  1 --
10.141.16.2:6800/193767 <== osd.136 10.141.16.24:6800/6764 6 
osd_op_reply(209 148aaac. [delete] v0'0 uv23797 ondisk = -2
((2)
No such file o
r directory)) v6  187+0+0 (64699207 0 0) 0x7ffa171acb00 con
0x7ffa014fd9c0
  -10> 2016-02-04 10:23:46.837406 7ff9ec994700  1 --
10.141.16.2:6800/193767 <== osd.36 10.141.16.14:6800/5395 5 
osd_op_reply(175 15f631f. [stat] v0'0 uv22466 ondisk = 0) v6
 187+0+16 (1037
61047 0 2527067705) 0x7ffa08363700 con 0x7ffa189ca580
   -9> 2016-02-04 10:23:46.837463 7ff9eba85700  1 --
10.141.16.2:6800/193767 <== osd.47 10.141.16.15:6802/7128 2 
osd_op_reply(211 148aac8. [delete] v0'0 uv22990 ondisk = -2
((2)
No such file or
directory)) v6  187+0+0 (1138385695 0 0) 0x7ffa01cd0dc0 con
0x7ffa189cadc0
   -8> 2016-02-04 10:23:46.837468 7ff9eb27d700  1 --
10.141.16.2:6800/193767 <== osd.16 10.141.16.12:6800/5739 2 
osd_op_reply(212 148aacd. [delete] v0'0 uv23991 ondisk = -2
((2)
No such file or
directory)) v6  187+0+0 (1675093742 0 0) 0x7ffa171ac840 con
0x7ffa189cb760
   -7> 2016-02-04 10:23:46.837477 7ff9eab76700  1 --
10.141.16.2:6800/193767 <== osd.66 10.141.16.17:6800/6353 2 
osd_op_reply(210 148aab9. [delete] v0'0 uv24583 ondisk = -2
((2)
No such file or
directory)) v6  187+0+0 (603192739 0 0) 0x7ffa19054680 con
0x7ffa189cbce0
   -6> 2016-02-04 10:23:46.838140 7ff9f0bcf700  1 --
10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 43 
osd_op_reply(121 200.9d96 [write 1459360~980] v943'4092 uv4092 ondisk
=
0) v6  179+0+0 (3939130488 0 0) 0x7ffa01590100 con 0x7ffa014fab00
   -5> 2016-02-04 10:23:46.838342 7ff9f0bcf700  1 --
10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 44 
osd_op_reply(124 200.9d96 [write 1460340~956] v943'4093 uv4093 ondisk
=
0) v6  179+0+0 (1434265886 0 0) 0x7ffa01590100 con 0x7ffa014fab00
   -4> 2016-02-04 10:23:46.838531 7ff9f0bcf700  1 --
10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 45 
osd_op_reply(126 200.9d96 [write 1461296~954] v943'4094 uv4094 ondisk
=
0) v6  179+0+0 (25292940 0 0) 0x7ffa01590100 con 0x7ffa014fab00
   -3> 2016-02-04 10:23:46.838700 7ff9ecd98700  1 --
10.141.16.2:6800/193767 <== osd.57 10.141.16.16:6802/7067 3 
osd_op_reply(199 15ef976. [stat] v0'0 uv22557 ondisk = 0) v6
 187+0+16 (354652996 0 2244692791) 0x7ffa171ade40 con 0x7ffa189ca160
   -2> 2016-02-04 10:23:46.839301 7ff9ed8a3700  1 --
10.141.16.2:6800/193767 <== osd.107 10.141.16.21:6802/7468 3 
osd_op_reply(115 1625476. [stat] v0'0 uv22587 ondisk = 0) v6
 187+0+16 (664308076 0 998461731) 0x7ffa08363c80 con 0x7ffa014fdb20
   -1> 2016-02-04 10:23:46.839322 7ff9f0bcf700  1 --
10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 46 
osd_op_reply(128 200.9d96 [write 1462250~954] v943'4095 uv4095 ondisk
=
0) v6  179+0+0 (1379768629 0 0) 0x7ffa01590100 con 0x7ffa014fab00
0> 2016-02-04 10:23:46.839379 7ff9f30d8700 -1 *** Caught signal
(Floating point exception) **
in thread 7ff9f30d8700

ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
1: (()+0x4b6fa2) [0x7ff9fd091fa2]
2: (()+0xf100) [0x7ff9fbfd3100]
3: (StrayManager::_calculate_ops_required(CInode*, bool)+0xa2)
[0x7ff9fcf0adc2]
4: (StrayManager::enqueue(CDentry*, bool)+0x169) [0x7ff9fcf10459]
5: (StrayManager::__eval_stray(CDentry*, bool)+0xa49) [0x7ff9fcf111c9]
6: (StrayManager::eval_stray(CDentry*, bool)+0x1e) [0x7ff9fcf113ce]
7: (MDCache::scan_stray_dir(dirfrag_t)+0x13d) [0x7ff9fce6741d]
8: (MDSInternalContextBase::complete(int)+0x1e3) [0x7ff9fcff4993]
9: (MDSRank::_advance_queues()+0x382) [0x7ff9fcdd4652]
10: (MDSRank::ProgressThread::entry()+0x4a) [0x7ff9fcdd4aca]
11: (()+0x7dc5) [0x7ff9fbfcbdc5]
12: (clone()+0x6d) [0x7ff9faeb621d]

Does someone has an idea? We can't use our fs right now..

Hey, fun! Just looking for FPE opportunities in that function, it
looks like someone ma

Re: [ceph-users] ceph 9.2.0 mds cluster went down and now constantly crashes with Floating point exception

2016-02-05 Thread Kenneth Waegeman



On 04/02/16 16:17, Gregory Farnum wrote:

On Thu, Feb 4, 2016 at 1:42 AM, Kenneth Waegeman
<kenneth.waege...@ugent.be> wrote:

Hi,

Hi, we are running ceph 9.2.0.
Overnight, our ceph state went to 'mds mds03 is laggy' . When I checked the
logs, I saw this mds crashed with a stacktrace. I checked the other mdss,
and I saw the same there.
When I try to start the mds again, I get again a stacktrace and it won't
come up:

  -12> 2016-02-04 10:23:46.837131 7ff9ea570700  1 --
10.141.16.2:6800/193767 <== osd.146 10.141.16.25:6800/7036 1 
osd_op_reply(207 15ef982. [stat] v0'0 uv22184 ondisk = 0) v6
 187+0+16 (113
2261152 0 506978568) 0x7ffa171ae940 con 0x7ffa189cc3c0
 -11> 2016-02-04 10:23:46.837317 7ff9ed6a1700  1 --
10.141.16.2:6800/193767 <== osd.136 10.141.16.24:6800/6764 6 
osd_op_reply(209 148aaac. [delete] v0'0 uv23797 ondisk = -2 ((2)
No such file o
r directory)) v6  187+0+0 (64699207 0 0) 0x7ffa171acb00 con
0x7ffa014fd9c0
 -10> 2016-02-04 10:23:46.837406 7ff9ec994700  1 --
10.141.16.2:6800/193767 <== osd.36 10.141.16.14:6800/5395 5 
osd_op_reply(175 15f631f. [stat] v0'0 uv22466 ondisk = 0) v6
 187+0+16 (1037
61047 0 2527067705) 0x7ffa08363700 con 0x7ffa189ca580
  -9> 2016-02-04 10:23:46.837463 7ff9eba85700  1 --
10.141.16.2:6800/193767 <== osd.47 10.141.16.15:6802/7128 2 
osd_op_reply(211 148aac8. [delete] v0'0 uv22990 ondisk = -2 ((2)
No such file or
   directory)) v6  187+0+0 (1138385695 0 0) 0x7ffa01cd0dc0 con
0x7ffa189cadc0
  -8> 2016-02-04 10:23:46.837468 7ff9eb27d700  1 --
10.141.16.2:6800/193767 <== osd.16 10.141.16.12:6800/5739 2 
osd_op_reply(212 148aacd. [delete] v0'0 uv23991 ondisk = -2 ((2)
No such file or
   directory)) v6  187+0+0 (1675093742 0 0) 0x7ffa171ac840 con
0x7ffa189cb760
  -7> 2016-02-04 10:23:46.837477 7ff9eab76700  1 --
10.141.16.2:6800/193767 <== osd.66 10.141.16.17:6800/6353 2 
osd_op_reply(210 148aab9. [delete] v0'0 uv24583 ondisk = -2 ((2)
No such file or
   directory)) v6  187+0+0 (603192739 0 0) 0x7ffa19054680 con
0x7ffa189cbce0
  -6> 2016-02-04 10:23:46.838140 7ff9f0bcf700  1 --
10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 43 
osd_op_reply(121 200.9d96 [write 1459360~980] v943'4092 uv4092 ondisk =
0) v6  179+0+0 (3939130488 0 0) 0x7ffa01590100 con 0x7ffa014fab00
  -5> 2016-02-04 10:23:46.838342 7ff9f0bcf700  1 --
10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 44 
osd_op_reply(124 200.9d96 [write 1460340~956] v943'4093 uv4093 ondisk =
0) v6  179+0+0 (1434265886 0 0) 0x7ffa01590100 con 0x7ffa014fab00
  -4> 2016-02-04 10:23:46.838531 7ff9f0bcf700  1 --
10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 45 
osd_op_reply(126 200.9d96 [write 1461296~954] v943'4094 uv4094 ondisk =
0) v6  179+0+0 (25292940 0 0) 0x7ffa01590100 con 0x7ffa014fab00
  -3> 2016-02-04 10:23:46.838700 7ff9ecd98700  1 --
10.141.16.2:6800/193767 <== osd.57 10.141.16.16:6802/7067 3 
osd_op_reply(199 15ef976. [stat] v0'0 uv22557 ondisk = 0) v6
 187+0+16 (354652996 0 2244692791) 0x7ffa171ade40 con 0x7ffa189ca160
  -2> 2016-02-04 10:23:46.839301 7ff9ed8a3700  1 --
10.141.16.2:6800/193767 <== osd.107 10.141.16.21:6802/7468 3 
osd_op_reply(115 1625476. [stat] v0'0 uv22587 ondisk = 0) v6
 187+0+16 (664308076 0 998461731) 0x7ffa08363c80 con 0x7ffa014fdb20
  -1> 2016-02-04 10:23:46.839322 7ff9f0bcf700  1 --
10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 46 
osd_op_reply(128 200.9d96 [write 1462250~954] v943'4095 uv4095 ondisk =
0) v6  179+0+0 (1379768629 0 0) 0x7ffa01590100 con 0x7ffa014fab00
   0> 2016-02-04 10:23:46.839379 7ff9f30d8700 -1 *** Caught signal
(Floating point exception) **
   in thread 7ff9f30d8700

   ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
   1: (()+0x4b6fa2) [0x7ff9fd091fa2]
   2: (()+0xf100) [0x7ff9fbfd3100]
   3: (StrayManager::_calculate_ops_required(CInode*, bool)+0xa2)
[0x7ff9fcf0adc2]
   4: (StrayManager::enqueue(CDentry*, bool)+0x169) [0x7ff9fcf10459]
   5: (StrayManager::__eval_stray(CDentry*, bool)+0xa49) [0x7ff9fcf111c9]
   6: (StrayManager::eval_stray(CDentry*, bool)+0x1e) [0x7ff9fcf113ce]
   7: (MDCache::scan_stray_dir(dirfrag_t)+0x13d) [0x7ff9fce6741d]
   8: (MDSInternalContextBase::complete(int)+0x1e3) [0x7ff9fcff4993]
   9: (MDSRank::_advance_queues()+0x382) [0x7ff9fcdd4652]
   10: (MDSRank::ProgressThread::entry()+0x4a) [0x7ff9fcdd4aca]
   11: (()+0x7dc5) [0x7ff9fbfcbdc5]
   12: (clone()+0x6d) [0x7ff9faeb621d]

Does someone has an idea? We can't use our fs right now..

Hey, fun! Just looking for FPE opportunities in that function, it
looks like someone managed to set either the object size or stripe
count to 0 on some of your files. Is that possible?
I am the only user on th

[ceph-users] ceph 9.2.0 mds cluster went down and now constantly crashes with Floating point exception

2016-02-04 Thread Kenneth Waegeman

Hi,

Hi, we are running ceph 9.2.0.
Overnight, our ceph state went to 'mds mds03 is laggy' . When I checked 
the logs, I saw this mds crashed with a stacktrace. I checked the other 
mdss, and I saw the same there.
When I try to start the mds again, I get again a stacktrace and it won't 
come up:


 -12> 2016-02-04 10:23:46.837131 7ff9ea570700  1 -- 
10.141.16.2:6800/193767 <== osd.146 10.141.16.25:6800/7036 1  
osd_op_reply(207 15ef982. [stat] v0'0 uv22184 ondisk = 0) v6 
 187+0+16 (113

2261152 0 506978568) 0x7ffa171ae940 con 0x7ffa189cc3c0
-11> 2016-02-04 10:23:46.837317 7ff9ed6a1700  1 -- 
10.141.16.2:6800/193767 <== osd.136 10.141.16.24:6800/6764 6  
osd_op_reply(209 148aaac. [delete] v0'0 uv23797 ondisk = -2 
((2) No such file o
r directory)) v6  187+0+0 (64699207 0 0) 0x7ffa171acb00 con 
0x7ffa014fd9c0
-10> 2016-02-04 10:23:46.837406 7ff9ec994700  1 -- 
10.141.16.2:6800/193767 <== osd.36 10.141.16.14:6800/5395 5  
osd_op_reply(175 15f631f. [stat] v0'0 uv22466 ondisk = 0) v6 
 187+0+16 (1037

61047 0 2527067705) 0x7ffa08363700 con 0x7ffa189ca580
 -9> 2016-02-04 10:23:46.837463 7ff9eba85700  1 -- 
10.141.16.2:6800/193767 <== osd.47 10.141.16.15:6802/7128 2  
osd_op_reply(211 148aac8. [delete] v0'0 uv22990 ondisk = -2 
((2) No such file or
  directory)) v6  187+0+0 (1138385695 0 0) 0x7ffa01cd0dc0 con 
0x7ffa189cadc0
 -8> 2016-02-04 10:23:46.837468 7ff9eb27d700  1 -- 
10.141.16.2:6800/193767 <== osd.16 10.141.16.12:6800/5739 2  
osd_op_reply(212 148aacd. [delete] v0'0 uv23991 ondisk = -2 
((2) No such file or
  directory)) v6  187+0+0 (1675093742 0 0) 0x7ffa171ac840 con 
0x7ffa189cb760
 -7> 2016-02-04 10:23:46.837477 7ff9eab76700  1 -- 
10.141.16.2:6800/193767 <== osd.66 10.141.16.17:6800/6353 2  
osd_op_reply(210 148aab9. [delete] v0'0 uv24583 ondisk = -2 
((2) No such file or
  directory)) v6  187+0+0 (603192739 0 0) 0x7ffa19054680 con 
0x7ffa189cbce0
 -6> 2016-02-04 10:23:46.838140 7ff9f0bcf700  1 -- 
10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 43  
osd_op_reply(121 200.9d96 [write 1459360~980] v943'4092 uv4092 
ondisk = 0) v6  179+0+0 (3939130488 0 0) 0x7ffa01590100 con 
0x7ffa014fab00
 -5> 2016-02-04 10:23:46.838342 7ff9f0bcf700  1 -- 
10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 44  
osd_op_reply(124 200.9d96 [write 1460340~956] v943'4093 uv4093 
ondisk = 0) v6  179+0+0 (1434265886 0 0) 0x7ffa01590100 con 
0x7ffa014fab00
 -4> 2016-02-04 10:23:46.838531 7ff9f0bcf700  1 -- 
10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 45  
osd_op_reply(126 200.9d96 [write 1461296~954] v943'4094 uv4094 
ondisk = 0) v6  179+0+0 (25292940 0 0) 0x7ffa01590100 con 0x7ffa014fab00
 -3> 2016-02-04 10:23:46.838700 7ff9ecd98700  1 -- 
10.141.16.2:6800/193767 <== osd.57 10.141.16.16:6802/7067 3  
osd_op_reply(199 15ef976. [stat] v0'0 uv22557 ondisk = 0) v6 
 187+0+16 (354652996 0 2244692791) 0x7ffa171ade40 con 0x7ffa189ca160
 -2> 2016-02-04 10:23:46.839301 7ff9ed8a3700  1 -- 
10.141.16.2:6800/193767 <== osd.107 10.141.16.21:6802/7468 3  
osd_op_reply(115 1625476. [stat] v0'0 uv22587 ondisk = 0) v6 
 187+0+16 (664308076 0 998461731) 0x7ffa08363c80 con 0x7ffa014fdb20
 -1> 2016-02-04 10:23:46.839322 7ff9f0bcf700  1 -- 
10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 46  
osd_op_reply(128 200.9d96 [write 1462250~954] v943'4095 uv4095 
ondisk = 0) v6  179+0+0 (1379768629 0 0) 0x7ffa01590100 con 
0x7ffa014fab00
  0> 2016-02-04 10:23:46.839379 7ff9f30d8700 -1 *** Caught signal 
(Floating point exception) **

  in thread 7ff9f30d8700

  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
  1: (()+0x4b6fa2) [0x7ff9fd091fa2]
  2: (()+0xf100) [0x7ff9fbfd3100]
  3: (StrayManager::_calculate_ops_required(CInode*, bool)+0xa2) 
[0x7ff9fcf0adc2]

  4: (StrayManager::enqueue(CDentry*, bool)+0x169) [0x7ff9fcf10459]
  5: (StrayManager::__eval_stray(CDentry*, bool)+0xa49) [0x7ff9fcf111c9]
  6: (StrayManager::eval_stray(CDentry*, bool)+0x1e) [0x7ff9fcf113ce]
  7: (MDCache::scan_stray_dir(dirfrag_t)+0x13d) [0x7ff9fce6741d]
  8: (MDSInternalContextBase::complete(int)+0x1e3) [0x7ff9fcff4993]
  9: (MDSRank::_advance_queues()+0x382) [0x7ff9fcdd4652]
  10: (MDSRank::ProgressThread::entry()+0x4a) [0x7ff9fcdd4aca]
  11: (()+0x7dc5) [0x7ff9fbfcbdc5]
  12: (clone()+0x6d) [0x7ff9faeb621d]

Does someone has an idea? We can't use our fs right now..

I included the full log of an mds start in attachment

Thanks!!

K



mds02.tar.bz2
Description: application/bzip
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] very high OSD RAM usage values

2016-01-06 Thread Kenneth Waegeman

Hi all,

We experienced some serious trouble with our cluster: A running cluster 
started failing and started a chain reaction until the ceph cluster was 
down, as about half the OSDs are down (in a EC pool)


Each host has 8 OSDS of 8 TB (i.e. RAID 0 of 2 4TB disk) for an EC pool 
(10+3, 14 hosts) and 2 cache OSDS and 32 GB of RAM.
The reason we have the Raid0 of the disks, is because we tried with 16 
disk before, but 32GB didn't seem enough to keep the cluster stable


We don't know for sure what triggered the chain reaction, but what we 
certainly see, is that while recovering, our OSDS are using a lot of 
memory. We've seen some OSDS using almost 8GB of RAM (resident; virtual 
11GB)
So right now we don't have enough memory to recover the cluster, because 
the  OSDS  get killed by OOMkiller before they can recover..

And I don't know doubling our memory will be enough..

A few questions:

* Does someone has seen this before?
* 2GB was still normal, but 8GB seems a lot, is this expected behaviour?
* We didn't see this with an nearly empty cluster. Now it was filled 
about 1/4 (270TB). I guess it would become worse when filled half or more?
* How high can this memory usage become ? Can we calculate the maximum 
memory of an OSD? Can we limit it ?

* We can upgrade/reinstall to infernalis, will that solve anything?

This is related to a previous post of me : 
http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/22259



Thank you very much !!

Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] python-flask not in repo's for infernalis

2015-12-14 Thread Kenneth Waegeman

Hi,

Is there a reason python-flask is not in the repo of infernalis anymore 
? In centos7 it is still not in the standard repos or epel..


Thanks!
Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph new installation of ceph 0.9.2 issue and crashing osds

2015-12-08 Thread Kenneth Waegeman

Hi,

I installed ceph 0.9.2 on a new cluster of 3 nodes, with 50 OSDs on each 
node (300GB disks, 96GB RAM)


While installing, I got some issue that I even could not login as ceph 
user. So I increased some limits:

 security/limits.conf

ceph-   nproc   1048576
ceph-   nofile 1048576

I could then install the other OSDs.

After the cluster was installed, I added some extra pools. when creating 
the pgs of these pools, the osds of the cluster started to fail, with 
stacktraces. If I try to restart them, they keep on failing. I don't 
know if this is an actual bug of Infernalis, or a limit that is still 
not high enough.. I've increased the noproc and nofile entries even 
more, but no luck. Someone has a clue? Hereby the stacktraces I see:


Mostly this one:

   -12> 2015-12-08 10:17:18.995243 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b(unlocked)] enter Initial
   -11> 2015-12-08 10:17:18.995279 7fa9063c5700  5 write_log with: 
dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, 
dirty_divergent_priors: false, divergent_priors: 0, writeout_from: 
4294967295'184467

44073709551615, trimmed:
   -10> 2015-12-08 10:17:18.995292 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive] exit Initial 0.48

0 0.00
-9> 2015-12-08 10:17:18.995301 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive] enter Reset
-8> 2015-12-08 10:17:18.995310 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 inactive] exit Reset 0.08

1 0.17
-7> 2015-12-08 10:17:18.995326 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 inactive] enter Started
-6> 2015-12-08 10:17:18.995332 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 inactive] enter Start
-5> 2015-12-08 10:17:18.995338 7fa9063c5700  1 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 inactive] state: transi

tioning to Primary
-4> 2015-12-08 10:17:18.995345 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 inactive] exit Start 0.12

0 0.00
-3> 2015-12-08 10:17:18.995352 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 inactive] enter Started/Primar

y
-2> 2015-12-08 10:17:18.995358 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 creating] enter Started/Primar

y/Peering
-1> 2015-12-08 10:17:18.995365 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 creating+peering] enter Starte

d/Primary/Peering/GetInfo
 0> 2015-12-08 10:17:18.998472 7fa9063c5700 -1 common/Thread.cc: In 
function 'void Thread::create(size_t)' thread 7fa9063c5700 time 
2015-12-08 10:17:18.995438

common/Thread.cc: 154: FAILED assert(ret == 0)

 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x85) [0x7fa91924ebe5]

 2: (Thread::create(unsigned long)+0x8a) [0x7fa91923325a]
 3: (SimpleMessenger::connect_rank(entity_addr_t const&, int, 
PipeConnection*, Message*)+0x185) [0x7fa919229105]
 4: (SimpleMessenger::get_connection(entity_inst_t const&)+0x3ba) 
[0x7fa9192298ea]
 5: (OSDService::get_con_osd_cluster(int, unsigned int)+0x1ab) 
[0x7fa918c7318b]
 6: (OSD::do_queries(std::map >, 
std::less, std::allocator > > > > >&, std::shared_ptr)+0x1f1) 
[0x7fa918c9b061]
 7: (OSD::dispatch_context(PG::RecoveryCtx&, PG*, 
std::shared_ptr, ThreadPool::TPHandle*)+0x142) 
[0x7fa918cb5832]
 8: (OSD::handle_pg_create(std::shared_ptr)+0x133e) 
[0x7fa918cb820e]

 9: (OSD::dispatch_op(std::shared_ptr)+0x220) [0x7fa918cbc0c0]
 10: (OSD::do_waiters()+0x1c2) [0x7fa918cbc382]
 11: (OSD::ms_dispatch(Message*)+0x227) [0x7fa918cbd727]
 12: (DispatchQueue::entry()+0x649) [0x7fa91930a939]
 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x7fa91922eb1d]
 14: (()+0x7df5) [0x7fa9172e3df5]
 15: (clone()+0x6d) [0x7fa915b8c1ad]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


Also these:

--- begin dump 

[ceph-users] upgrading 0.94.5 to 9.2.0 notes

2015-11-20 Thread Kenneth Waegeman

Hi,

I recently started a test to upgrade ceph from 0.94.5 to 9.2.0 on 
Centos7. I had some issues not mentioned in the release notes. Hereby 
some notes:


* Upgrading instructions are only in the release notes, not updated on 
the upgrade page in the docs: 
http://docs.ceph.com/docs/master/install/upgrading-ceph/


* Once you've updated the packages, `service ceph stop` or `service ceph 
stop `  won't actually work anymore, is pointing to a 
non-existing target. This is a step in the upgrade procedure I couldn't 
do, I manually killed the processes.

[root@ceph001 ~]# service ceph stop osd
Redirecting to /bin/systemctl stop  osd ceph.service
Failed to issue method call: Unit osd.service not loaded

* You also need to chown the journal partitions used for the osds. only 
chowning /var/lib/ceph is not enough


* Permissions on log files are not completely ok. The /var/log/ceph 
folder is owned by ceph, but existing files are still owned by root, so 
I had to manually chown these, otherwise I got messages like this:
2015-11-13 11:32:26.641870 7f55a4ffd700  1 mon.ceph003@2(peon).log v4672 
unable to write to '/var/log/ceph/ceph.log' for channel 'cluster': (13) 
Permission denied


.* I still get messages like these in the log files, not sure if they 
are harmless or not:


2015-11-13 11:52:53.840414 7f610f376700 -1 lsb_release_parse - pclose 
failed: (13) Permission denied


* systemctl start ceph.target does not start my osds.., I have to start 
them all with systemctl start ceph-osd@...
* systemctl restart ceph.target restart the running osds, but not the 
osds that are not yet running.

* systemctl stop ceph.target stops everything, as expected :)

I didn't tested everything thoroughly yet, but does someone has seen the 
same issues?


Thanks!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] all pgs of erasure coded pool stuck stale

2015-11-13 Thread Kenneth Waegeman

Hi all,

What could be the reason that all pgs of a whole Erasure Coded pool are 
stuck stale? All OSDS are restarted and up..


The details:
We have a setup with 14 OSD hosts with specific OSDs for an Erasure 
coded pool and 2 SSDS for a cache pool, and 3 seperate monitor/metadata 
nodes with ssds for the metadata pool


This afternoon I had to reboot some OSD nodes, because they weren't 
reachable anymore. After the cluster recovered, some pgs were stuck 
stale. I saw with `health detail` that it were all the pgs of 2 specific 
EC-pool osds. I tried with restarting them, but that didn't solve the 
problem. I restarted all osds on those nodes, but now all pgs on the 
osds for EC on that node were stuck stale. I read in the doc that this 
state is reached when it is not communicating with the monitors, so I 
restarted the monitors. Since that did not solve it, I tried to restart 
everything.


When the cluster was recovered again, all other PGs are back 
active+clean, except for the pgs in the EC pool, those are still 
stale+active+clean or even stale+active+clean+scrubbing+deep


When I try to query such a pg (eg. `ceph pg 2.1b0 query`), it just hangs 
there.. That is not the case for the other pools
If I interrupt, I get: Error EINTR: problem getting command descriptions 
from pg.2.1b0


I can't see anything strange in the logs of these pgs (attached)

Someone an idea?

Help very much appreciated!

Thanks!

Kenneth
2015-11-13 17:07:38.362392 7fe857b73900  0 ceph version 9.0.3 
(7295612d29f953f46e6e88812ef372b89a43b9da), process ceph-osd, pid 16956
2015-11-13 17:07:38.489267 7fe857b73900  0 filestore(/var/lib/ceph/osd/ceph-29) 
backend xfs (magic 0x58465342)
2015-11-13 17:07:38.494638 7fe857b73900  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-29) detect_features: FIEMAP 
ioctl is disabled via 'filestore fiemap' config option
2015-11-13 17:07:38.494646 7fe857b73900  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-29) detect_features: 
SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2015-11-13 17:07:38.494696 7fe857b73900  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-29) detect_features: splice is 
supported
2015-11-13 17:07:38.538539 7fe857b73900  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-29) detect_features: syncfs(2) 
syscall fully supported (by glibc and kernel)
2015-11-13 17:07:38.561220 7fe857b73900  0 
xfsfilestorebackend(/var/lib/ceph/osd/ceph-29) detect_features: extsize is 
supported and your kernel >= 3.5
2015-11-13 17:07:38.790119 7fe857b73900  0 filestore(/var/lib/ceph/osd/ceph-29) 
mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-11-13 17:07:39.038637 7fe857b73900  1 journal _open 
/var/lib/ceph/osd/ceph-29/journal fd 21: 10737418240 bytes, block size 4096 
bytes, directio = 1, aio = 1
2015-11-13 17:07:39.055782 7fe857b73900  1 journal _open 
/var/lib/ceph/osd/ceph-29/journal fd 21: 10737418240 bytes, block size 4096 
bytes, directio = 1, aio = 1
2015-11-13 17:07:39.059490 7fe857b73900  0  cls/cephfs/cls_cephfs.cc:136: 
loading cephfs_size_scan
2015-11-13 17:07:39.059702 7fe857b73900  0  cls/hello/cls_hello.cc:271: 
loading cls_hello
2015-11-13 17:07:39.066342 7fe857b73900  0 osd.29 10582 crush map has features 
104186773504, adjusting msgr requires for clients
2015-11-13 17:07:39.066349 7fe857b73900  0 osd.29 10582 crush map has features 
379064680448 was 8705, adjusting msgr requires for mons
2015-11-13 17:07:39.066354 7fe857b73900  0 osd.29 10582 crush map has features 
379064680448, adjusting msgr requires for osds
2015-11-13 17:08:00.020520 7fe857b73900  0 osd.29 10582 load_pgs
2015-11-13 17:08:04.948021 7fe857b73900  0 osd.29 10582 load_pgs opened 254 pgs
2015-11-13 17:08:04.959217 7fe857b73900 -1 osd.29 10582 log_to_monitors 
{default=true}
2015-11-13 17:08:04.963778 7fe83d9a2700  0 osd.29 10582 ignoring osdmap until 
we have initialized
2015-11-13 17:08:04.963814 7fe83d9a2700  0 osd.29 10582 ignoring osdmap until 
we have initialized
2015-11-13 17:08:04.996676 7fe857b73900  0 osd.29 10582 done with init, 
starting boot process
2015-11-13 17:08:11.360655 7fe826e4f700  0 -- 10.143.16.13:6812/16956 >> 
10.143.16.13:6816/2822 pipe(0x4c259000 sd=181 :6812 s=0 pgs=0 cs=0 l=0 
c=0x4c1a1e40).accept connect_seq 0 vs existing 0 state connecting
2015-11-13 17:08:11.360716 7fe826f50700  0 -- 10.143.16.13:6812/16956 >> 
10.143.16.13:6814/2729 pipe(0x4c254000 sd=180 :6812 s=0 pgs=0 cs=0 l=0 
c=0x4c1a1ce0).accept connect_seq 0 vs existing 0 state connecting
2015-11-13 17:08:11.360736 7fe826b4c700  0 -- 10.143.16.13:6812/16956 >> 
10.143.16.13:6800/1002914 pipe(0x4c26f000 sd=183 :6812 s=0 pgs=0 cs=0 l=0 
c=0x4c1a2260).accept connect_seq 0 vs existing 0 state connecting
2015-11-13 17:08:11.361034 7fe82694a700  0 -- 10.143.16.13:6812/16956 >> 
10.143.16.14:6808/13526 pipe(0x4c292000 sd=185 :6812 s=0 pgs=0 cs=0 l=0 
c=0x4c1a23c0).accept connect_seq 0 vs existing 0 state connecting

___

Re: [ceph-users] Problem with infernalis el7 package

2015-11-10 Thread Kenneth Waegeman



On 10/11/15 02:07, c...@dolphin-it.de wrote:


Hello,

I filed a new ticket:
http://tracker.ceph.com/issues/13739

Regards,
Kevin

[ceph-users] Problem with infernalis el7 package (10-Nov-2015 1:57)
From:   Bob R
To:ceph-users@lists.ceph.com


Hello,


We've got two problems trying to update our cluster to infernalis-


ceph-deploy install --release infernalis neb-kvm00



[neb-kvm00][INFO  ] Running command: sudo rpm --import 
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
[neb-kvm00][INFO  ] Running command: sudo rpm -Uvh --replacepkgs 
http://ceph.com/rpm-infernalis/el7/noarch/ceph-release-1-0.el7.noarch.rpm
[neb-kvm00][WARNIN] curl: (22) The requested URL returned error: 404 Not Found
[neb-kvm00][WARNIN] error: skipping 
http://ceph.com/rpm-infernalis/el7/noarch/ceph-release-1-0.el7.noarch.rpm - 
transfer failed
[neb-kvm00][DEBUG ] Retrieving 
http://ceph.com/rpm-infernalis/el7/noarch/ceph-release-1-0.el7.noarch.rpm
[neb-kvm00][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: rpm -Uvh 
--replacepkgs 
http://ceph.com/rpm-infernalis/el7/noarch/ceph-release-1-0.el7.noarch.rpm


^^ the ceph-release package is named "ceph-release-1-1.el7.noarch.rpm"


Trying to install manually on that (or any other) host we're seeing a 
dependency which makes us think the package is built improperly-


--> Running transaction check
---> Package ceph.x86_64 1:9.2.0-0.el7 will be an update
--> Processing Dependency: 
/home/jenkins-build/build/workspace/ceph-build-next/ARCH/x86_64/DIST/centos7/venv/bin/python
 for package: 1:ceph-9.2.0-0.el7.x86_64
---> Package selinux-policy.noarch 0:3.13.1-23.el7 will be updated
---> Package selinux-policy.noarch 0:3.13.1-23.el7_1.21 will be an update
--> Processing Dependency: 
/home/jenkins-build/build/workspace/ceph-build-next/ARCH/x86_64/DIST/centos7/venv/bin/python
 for package: 1:ceph-9.2.0-0.el7.x86_64
--> Finished Dependency Resolution
Error: Package: 1:ceph-9.2.0-0.el7.x86_64 (Ceph)
Requires: 
/home/jenkins-build/build/workspace/ceph-build-next/ARCH/x86_64/DIST/centos7/venv/bin/python
  You could try using --skip-broken to work around the problem
  You could try running: rpm -Va --nofiles --nodigest

Hi,

We also see this last problem with the 9.2.0 release. We tried to update 
from 0.94.5 to infernalis, and got this jenkins dependency thingy. Our 
packages are not installed with ceph-deploy. I'll update the ticket with 
our logs.


K


Thanks -Bob
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problem with infernalis el7 package

2015-11-10 Thread Kenneth Waegeman

Because our problem was not related to ceph-deploy, I created a new ticket:
http://tracker.ceph.com/issues/13746

On 10/11/15 16:53, Kenneth Waegeman wrote:



On 10/11/15 02:07, c...@dolphin-it.de wrote:


Hello,

I filed a new ticket:
http://tracker.ceph.com/issues/13739

Regards,
Kevin

[ceph-users] Problem with infernalis el7 package (10-Nov-2015 1:57)
From:   Bob R
To:ceph-users@lists.ceph.com


Hello,


We've got two problems trying to update our cluster to infernalis-


ceph-deploy install --release infernalis neb-kvm00



[neb-kvm00][INFO  ] Running command: sudo rpm --import 
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
[neb-kvm00][INFO  ] Running command: sudo rpm -Uvh --replacepkgs 
http://ceph.com/rpm-infernalis/el7/noarch/ceph-release-1-0.el7.noarch.rpm 

[neb-kvm00][WARNIN] curl: (22) The requested URL returned error: 404 
Not Found
[neb-kvm00][WARNIN] error: skipping 
http://ceph.com/rpm-infernalis/el7/noarch/ceph-release-1-0.el7.noarch.rpm 
- transfer failed
[neb-kvm00][DEBUG ] Retrieving 
http://ceph.com/rpm-infernalis/el7/noarch/ceph-release-1-0.el7.noarch.rpm
[neb-kvm00][ERROR ] RuntimeError: command returned non-zero exit 
status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: rpm 
-Uvh --replacepkgs 
http://ceph.com/rpm-infernalis/el7/noarch/ceph-release-1-0.el7.noarch.rpm



^^ the ceph-release package is named "ceph-release-1-1.el7.noarch.rpm"


Trying to install manually on that (or any other) host we're seeing a 
dependency which makes us think the package is built improperly-



--> Running transaction check
---> Package ceph.x86_64 1:9.2.0-0.el7 will be an update
--> Processing Dependency: 
/home/jenkins-build/build/workspace/ceph-build-next/ARCH/x86_64/DIST/centos7/venv/bin/python 
for package: 1:ceph-9.2.0-0.el7.x86_64

---> Package selinux-policy.noarch 0:3.13.1-23.el7 will be updated
---> Package selinux-policy.noarch 0:3.13.1-23.el7_1.21 will be an 
update
--> Processing Dependency: 
/home/jenkins-build/build/workspace/ceph-build-next/ARCH/x86_64/DIST/centos7/venv/bin/python 
for package: 1:ceph-9.2.0-0.el7.x86_64

--> Finished Dependency Resolution
Error: Package: 1:ceph-9.2.0-0.el7.x86_64 (Ceph)
Requires: 
/home/jenkins-build/build/workspace/ceph-build-next/ARCH/x86_64/DIST/centos7/venv/bin/python

  You could try using --skip-broken to work around the problem
  You could try running: rpm -Va --nofiles --nodigest

Hi,

We also see this last problem with the 9.2.0 release. We tried to 
update from 0.94.5 to infernalis, and got this jenkins dependency 
thingy. Our packages are not installed with ceph-deploy. I'll update 
the ticket with our logs.


K


Thanks -Bob
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph mds operations

2015-11-10 Thread Kenneth Waegeman

Hi all,

Is there a way to see what an MDS is actually doing? We are testing 
metadata operations, but in the ceph status output only see about 50 
ops/s :  client io 90791 kB/s rd, 54 op/s
Our active ceph-mds is using a lot of cpu and 25GB of memory, so I guess 
it is doing a lot of operations from memory, which are not shown in the 
output? Is there a way to see this more transparent?


Thanks!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] upgrading from 0.9.3 to 9.1.0 and systemd

2015-10-19 Thread Kenneth Waegeman

Hi all,

I tried upgrading ceph from 0.9.3 to 9.1.0, but ran into some troubles.
I chowned the /var/lib/ceph folder as described in the release notes, 
but my journal is on a seperate partition, so I get:


Oct 19 11:58:59 ceph001.cubone.os systemd[1]: Started Ceph object 
storage daemon.
Oct 19 11:58:59 ceph001.cubone.os ceph-osd[6806]: starting osd.1 at :/0 
osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal
Oct 19 11:58:59 ceph001.cubone.os ceph-osd[6806]: 2015-10-19 
11:58:59.530204 7f18aeba8900 -1 filestore(/var/lib/ceph/osd/ceph-1) 
mount failed to open journal /var/lib/ceph/osd/ceph-1/journal: (13) 
Permission den
Oct 19 11:58:59 ceph001.cubone.os ceph-osd[6806]: 2015-10-19 
11:58:59.540355 7f18aeba8900 -1 osd.1 0 OSD:init: unable to mount object 
store
Oct 19 11:58:59 ceph001.cubone.os ceph-osd[6806]: 2015-10-19 
11:58:59.540370 7f18aeba8900 -1  ** ERROR: osd init failed: (13) 
Permission denied
Oct 19 11:58:59 ceph001.cubone.os systemd[1]: ceph-osd@1.service: main 
process exited, code=exited, status=1/FAILURE


Is this a known issue?
I tried chowning the journal partition, without luck, then instead I get 
this:


Oct 19 12:10:34 ceph001.cubone.os ceph-osd[7763]: in thread 7fbb986fe900
Oct 19 12:10:34 ceph001.cubone.os ceph-osd[7763]: ceph version 9.1.0 
(3be81ae6cf17fcf689cd6f187c4615249fea4f61)
Oct 19 12:10:34 ceph001.cubone.os ceph-osd[7763]: 1: (()+0x7e1f22) 
[0x7fbb98ef1f22]
Oct 19 12:10:34 ceph001.cubone.os ceph-osd[7763]: 2: (()+0xf130) 
[0x7fbb97067130]
Oct 19 12:10:34 ceph001.cubone.os ceph-osd[7763]: 3: (gsignal()+0x37) 
[0x7fbb958255d7]
Oct 19 12:10:34 ceph001.cubone.os ceph-osd[7763]: 4: (abort()+0x148) 
[0x7fbb95826cc8]
Oct 19 12:10:34 ceph001.cubone.os ceph-osd[7763]: 5: 
(__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fbb961389b5]
Oct 19 12:10:34 ceph001.cubone.os ceph-osd[7763]: 6: (()+0x5e926) 
[0x7fbb96136926]
Oct 19 12:10:34 ceph001.cubone.os ceph-osd[7763]: 7: (()+0x5e953) 
[0x7fbb96136953]
Oct 19 12:10:34 ceph001.cubone.os ceph-osd[7763]: 8: (()+0x5eb73) 
[0x7fbb96136b73]
Oct 19 12:10:34 ceph001.cubone.os ceph-osd[7763]: 9: 
(ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x27a) [0x7fbb98fe766a]
Oct 19 12:10:34 ceph001.cubone.os ceph-osd[7763]: 10: 
(OSDService::get_map(unsigned int)+0x3d) [0x7fbb98a97e2d]
Oct 19 12:10:34 ceph001.cubone.os ceph-osd[7763]: 11: 
(OSD::init()+0xb0b) [0x7fbb98a4bf7b]
Oct 19 12:10:34 ceph001.cubone.os ceph-osd[7763]: 12: (main()+0x2998) 
[0x7fbb989cf3b8]
Oct 19 12:10:34 ceph001.cubone.os ceph-osd[7763]: 13: 
(__libc_start_main()+0xf5) [0x7fbb95811af5]
Oct 19 12:10:34 ceph001.cubone.os ceph-osd[7763]: 14: (()+0x2efb49) 
[0x7fbb989ffb49]
Oct 19 12:10:34 ceph001.cubone.os ceph-osd[7763]: NOTE: a copy of the 
executable, or `objdump -rdS ` is needed to interpret this.
Oct 19 12:10:34 ceph001.cubone.os ceph-osd[7763]: 0> 2015-10-19 
12:10:34.710385 7fbb986fe900 -1 *** Caught signal (Aborted) **


So the OSDs do not start..

By the way, is there an easy way to only restart osds, not the mons or 
other daemons as with ceph.target?

Could there be seperate targets for the osd/mon/.. types?

Thanks!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mds0: Client client008 failing to respond to capability release

2015-09-21 Thread Kenneth Waegeman

Hi all!

A quick question:
We are syncing data over cephfs , and we are seeing messages in our 
output like:


mds0: Client client008 failing to respond to capability release

What does this mean? I don't find information about this somewhere else.

We are running ceph 9.0.3

On earlier versions, we often saw messages like 'failing to respond to 
cache pressure',  is this related?


Thanks!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds0: Client client008 failing to respond to capability release

2015-09-21 Thread Kenneth Waegeman



On 21/09/15 16:32, John Spray wrote:

On Mon, Sep 21, 2015 at 2:33 PM, Kenneth Waegeman
<kenneth.waege...@ugent.be> wrote:

Hi all!

A quick question:
We are syncing data over cephfs , and we are seeing messages in our output
like:

mds0: Client client008 failing to respond to capability release

What does this mean? I don't find information about this somewhere else.

It means the MDS thinks that client is exhibiting a buggy behaviour in
failing to respond to requests to release resources.


We are running ceph 9.0.3

On earlier versions, we often saw messages like 'failing to respond to cache
pressure',  is this related?

They're two different health checks that both indicate potential
problems with the clients.

What version of ceph-fuse or kernel client is in use?

ceph-fuse is used, also 9.0.3


John


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Questions about erasure code pools

2015-08-03 Thread Kenneth Waegeman

Hi,

I read here in the documentation: 
http://docs.ceph.com/docs/master/architecture/#erasure-coding


In an erasure coded pool, the primary OSD in the up set receives all 
write operations. 
I dont' find what happens with read operations. Does the client contact 
the primary and does this OSD collect the object, or does the client 
decode the data of the ecpool itself?


Another question: When using cephfs with it, we have to use a cache pool 
on top of it. But this forms a huge bottleneck for read-only operations. 
Is it possible to use the ec pool directly and bypass the cache for 
read-only mounts/ scenarios ?

One such a use case would be to read out cephfs snapshots.

Another question about using caches with erasure code: Is it 
configurable to have some kind of affinity between the Primary OSD of 
the ecpool and the cache OSD. This to limit network traffic when 
flushing data from the cache.


Many thanks !!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rados bench multiple clients error

2015-07-31 Thread Kenneth Waegeman

Hi,

I was trying rados bench, and first wrote 250 objects from 14 hosts with 
 --no-cleanup. Then I ran the read tests from the same 14 hosts and ran 
into this:


[root@osd007 test]# /usr/bin/rados -p ectest bench 100 seq
2015-07-31 17:52:51.027872 7f6c40de17c0 -1 WARNING: the following 
dangerous and experimental features are enabled: keyvaluestore


   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
 0   0 0 0 0 0 - 0
read got -2
error during benchmark: -5
error 5: (5) Input/output error

The objects are there:
...
benchmark_data_osd011.gigalith.os_39338_object2820
benchmark_data_osd004.gigalith.os_142795_object3059
benchmark_data_osd001.gigalith.os_98375_object1182
benchmark_data_osd007.gigalith.os_20502_object2226
benchmark_data_osd008.gigalith.os_3059_object2183
benchmark_data_osd001.gigalith.os_94812_object1390
benchmark_data_osd010.gigalith.os_37614_object253
benchmark_data_osd011.gigalith.os_41998_object1093
benchmark_data_osd009.gigalith.os_90933_object1270
benchmark_data_osd010.gigalith.os_35614_object393
benchmark_data_osd009.gigalith.os_90933_object2611
benchmark_data_osd010.gigalith.os_35614_object2114
benchmark_data_osd013.gigalith.os_29915_object976
benchmark_data_osd014.gigalith.os_45604_object2497
benchmark_data_osd003.gigalith.os_147071_object1775
...


This works when only using 1 host..
Is there a way to run the benchmarks with multiple instances?

I'm looking to find what our performance problem is, and what the 
difference is between directly reading objects from the erasure coded 
pool and through the cache layer.


I tested to read large files that weren't in cache from 14 hosts through 
cephfs ( cached files are performing enough) and got only 8MB/ stream, 
while our disks were hardly working (as seen in iostat)
So my next steps would be to run these tests through rados: first 
directly on ecpool, and then on cache pool.. Someone an idea?



Thank you!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] A cache tier issue with rate only at 20MB/s when data move from cold pool to hot pool

2015-07-30 Thread Kenneth Waegeman



On 06/16/2015 01:17 PM, Kenneth Waegeman wrote:

Hi!

We also see this at our site:  When we cat a large file from cephfs to
/dev/null, we get about 10MB/s data transfer.  I also do not see a
system resource bottleneck.
Our cluster consists of 14 servers with each 16 disks, together forming
a EC coded pool. We also have 2SSDs per server for the cache. Running
0.94.1

Hi,

Does someone has an idea about this? Is there some debugging or testing 
we can do to find the problem here?


Thank you!

Kenneth




So we are having the same question.

Our cache pool is in writeback mode, can it help to set it in readonly
for this?

Kenneth

On 06/16/2015 12:58 PM, liukai wrote:

Hi all,
   A cache tier, 2 hot node with 8 ssd osd, and 2 cold node with 24 sata
osd.

   The public network rate is 1Mb/s and cluster network rate is
1000Mb/s.

   Using fuse-client to access the files.
The issue is:

   When the files are in hot pool, the copy rate is very fash.

   But when the files are only in cold pool, the rate only reach 20MB/s.

   I known that when files are not in the hot pool, the files should
been copy from cold pool to hot pool first, and then from hot pool to
client.

   But the cpu, ram and network seems not the bottleneck. and will be
the cause of system design?

   Are there some params to adjust to improve the rate from cold pool to
hot pool?

Thanks
2015-06-16

liukai


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD RAM usage values

2015-07-29 Thread Kenneth Waegeman



On 07/28/2015 04:04 PM, Dan van der Ster wrote:

On Tue, Jul 28, 2015 at 12:07 PM, Gregory Farnum g...@gregs42.com wrote:

On Tue, Jul 28, 2015 at 11:00 AM, Kenneth Waegeman
kenneth.waege...@ugent.be wrote:



On 07/17/2015 02:50 PM, Gregory Farnum wrote:


On Fri, Jul 17, 2015 at 1:13 PM, Kenneth Waegeman
kenneth.waege...@ugent.be wrote:


Hi all,

I've read in the documentation that OSDs use around 512MB on a healthy
cluster.(http://ceph.com/docs/master/start/hardware-recommendations/#ram)
Now, our OSD's are all using around 2GB of RAM memory while the cluster
is
healthy.


PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND
29784 root  20   0 6081276 2.535g   4740 S   0.7  8.1   1346:55
ceph-osd
32818 root  20   0 5417212 2.164g  24780 S  16.2  6.9   1238:55
ceph-osd
25053 root  20   0 5386604 2.159g  27864 S   0.7  6.9   1192:08
ceph-osd
33875 root  20   0 5345288 2.092g   3544 S   0.7  6.7   1188:53
ceph-osd
30779 root  20   0 5474832 2.090g  28892 S   1.0  6.7   1142:29
ceph-osd
22068 root  20   0 5191516 2.000g  28664 S   0.7  6.4  31:56.72
ceph-osd
34932 root  20   0 5242656 1.994g   4536 S   0.3  6.4   1144:48
ceph-osd
26883 root  20   0 5178164 1.938g   6164 S   0.3  6.2   1173:01
ceph-osd
31796 root  20   0 5193308 1.916g  27000 S  16.2  6.1 923:14.87
ceph-osd
25958 root  20   0 5193436 1.901g   2900 S   0.7  6.1   1039:53
ceph-osd
27826 root  20   0 5225764 1.845g   5576 S   1.0  5.9   1031:15
ceph-osd
36011 root  20   0 5111660 1.823g  20512 S  15.9  5.8   1093:01
ceph-osd
19736 root  20   0 2134680 0.994g  0 S   0.3  3.2  46:13.47
ceph-osd



[root@osd003 ~]# ceph status
2015-07-17 14:03:13.865063 7f1fde5f0700 -1 WARNING: the following
dangerous
and experimental features are enabled: keyvaluestore
2015-07-17 14:03:13.887087 7f1fde5f0700 -1 WARNING: the following
dangerous
and experimental features are enabled: keyvaluestore
  cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
   health HEALTH_OK
   monmap e1: 3 mons at

{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
  election epoch 58, quorum 0,1,2 mds01,mds02,mds03
   mdsmap e17218: 1/1/1 up {0=mds03=up:active}, 1 up:standby
   osdmap e25542: 258 osds: 258 up, 258 in
pgmap v2460163: 4160 pgs, 4 pools, 228 TB data, 154 Mobjects
  270 TB used, 549 TB / 819 TB avail
  4152 active+clean
 8 active+clean+scrubbing+deep


We are using erasure code on most of our OSDs, so maybe that is a reason.
But also the cache-pool filestore OSDS on 200GB SSDs are using 2GB of
RAM.
Our erasure code pool (16*14 osds) have a pg_num of 2048; our cache pool
(2*14 OSDS) has a pg_num of 1024.

Are these normal values for this configuration, and is the documentation
a
bit outdated, or should we look into something else?



2GB of RSS is larger than I would have expected, but not unreasonable.
In particular I don't think we've gathered numbers on either EC pools
or on the effects of the caching processes.



Which data is actually in memory of the OSDS?
Is this mostly cached data?
We are short on memory on these servers, can we have influence on this?


Mmm, we've discussed this a few times on the mailing list. The CERN
guys published a document on experimenting with a very large cluster
and not enough RAM, but there's nothing I would really recommend
changing for a production system, especially an EC one, if you aren't
intimately familiar with what's going on.


In that CERN test the obvious large memory consumer was the osdmap
cache, which was so large because (a) the maps were getting quite
large (7200 OSDs creates a 4MB map, IIRC) and (b) so much osdmap churn
was leading each OSD to cache 500 of the maps. Once the cluster was
fully deployed and healthy, we could restart an OSD and it would then
only use ~300MB (because now the osdmap cache was ~empty).

Kenneth: does the memory usage shrink if you restart an osd? If so, it
could be a similar issue.


Thanks!
I tried restarting some OSDS when the cluster was healthy. Sometimes 
OSDS grow immediately back to the memory level they were having before. 
When trying again, they take about 1GB of memory, so about half. We do 
not see it going under that level, but that is maybe because of EC..


Kenneth


Cheers, Dan


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD RAM usage values

2015-07-29 Thread Kenneth Waegeman



On 07/28/2015 04:21 PM, Mark Nelson wrote:



On 07/17/2015 07:50 AM, Gregory Farnum wrote:

On Fri, Jul 17, 2015 at 1:13 PM, Kenneth Waegeman
kenneth.waege...@ugent.be wrote:

Hi all,

I've read in the documentation that OSDs use around 512MB on a healthy
cluster.(http://ceph.com/docs/master/start/hardware-recommendations/#ram)

Now, our OSD's are all using around 2GB of RAM memory while the
cluster is
healthy.


   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND
29784 root  20   0 6081276 2.535g   4740 S   0.7  8.1   1346:55
ceph-osd
32818 root  20   0 5417212 2.164g  24780 S  16.2  6.9   1238:55
ceph-osd
25053 root  20   0 5386604 2.159g  27864 S   0.7  6.9   1192:08
ceph-osd
33875 root  20   0 5345288 2.092g   3544 S   0.7  6.7   1188:53
ceph-osd
30779 root  20   0 5474832 2.090g  28892 S   1.0  6.7   1142:29
ceph-osd
22068 root  20   0 5191516 2.000g  28664 S   0.7  6.4  31:56.72
ceph-osd
34932 root  20   0 5242656 1.994g   4536 S   0.3  6.4   1144:48
ceph-osd
26883 root  20   0 5178164 1.938g   6164 S   0.3  6.2   1173:01
ceph-osd
31796 root  20   0 5193308 1.916g  27000 S  16.2  6.1 923:14.87
ceph-osd
25958 root  20   0 5193436 1.901g   2900 S   0.7  6.1   1039:53
ceph-osd
27826 root  20   0 5225764 1.845g   5576 S   1.0  5.9   1031:15
ceph-osd
36011 root  20   0 5111660 1.823g  20512 S  15.9  5.8   1093:01
ceph-osd
19736 root  20   0 2134680 0.994g  0 S   0.3  3.2  46:13.47
ceph-osd



[root@osd003 ~]# ceph status
2015-07-17 14:03:13.865063 7f1fde5f0700 -1 WARNING: the following
dangerous
and experimental features are enabled: keyvaluestore
2015-07-17 14:03:13.887087 7f1fde5f0700 -1 WARNING: the following
dangerous
and experimental features are enabled: keyvaluestore
 cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
  health HEALTH_OK
  monmap e1: 3 mons at
{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}

 election epoch 58, quorum 0,1,2 mds01,mds02,mds03
  mdsmap e17218: 1/1/1 up {0=mds03=up:active}, 1 up:standby
  osdmap e25542: 258 osds: 258 up, 258 in
   pgmap v2460163: 4160 pgs, 4 pools, 228 TB data, 154 Mobjects
 270 TB used, 549 TB / 819 TB avail
 4152 active+clean
8 active+clean+scrubbing+deep


We are using erasure code on most of our OSDs, so maybe that is a
reason.
But also the cache-pool filestore OSDS on 200GB SSDs are using 2GB of
RAM.
Our erasure code pool (16*14 osds) have a pg_num of 2048; our cache pool
(2*14 OSDS) has a pg_num of 1024.

Are these normal values for this configuration, and is the
documentation a
bit outdated, or should we look into something else?


2GB of RSS is larger than I would have expected, but not unreasonable.
In particular I don't think we've gathered numbers on either EC pools
or on the effects of the caching processes.


FWIW, here's statistics for ~36 ceph-osds on the wip-promote-prob branch
after several hours of cache tiering tests (30 OSD base, 6 OS cache
tier) using an EC6+2 pool.  At the time of this test, 4K random
read/writes were being performed.  The cache tier OSDs specifically use
quite a bit more memory than the base tier.  Interestingly in this test
major pagefaults are showing up for the cache tier OSDs which is
annoying. I may need to tweak kernel VM settings on this box.


Ah, we see the same here with our cache OSDS: those small OSDS are 
taking the most memory, on some servers they are taking 3G of RAM.

Even If I restart these , they take up the same amount again.



# PROCESS SUMMARY (counters are /sec)
#Time  PID  User PR  PPID THRD S   VSZ   RSS CP  SysT  UsrT
Pct  AccuTime  RKB  WKB MajF MinF Command
09:58:48   715  root 20 1  424 S1G  271M  8  0.19  0.43
6  30:12.64000 2502 /usr/local/bin/ceph-osd
09:58:48  1363  root 20 1  424 S1G  325M  8  0.14  0.33
4  26:50.54000   68 /usr/local/bin/ceph-osd
09:58:48  2080  root 20 1  420 S1G  276M  1  0.21  0.49
7  23:49.36000 2848 /usr/local/bin/ceph-osd
09:58:48  2747  root 20 1  424 S1G  283M  8  0.25  0.68
9  25:16.63000 1391 /usr/local/bin/ceph-osd
09:58:48  3451  root 20 1  424 S1G  331M  6  0.13  0.14
2  27:36.71000  148 /usr/local/bin/ceph-osd
09:58:48  4172  root 20 1  424 S1G  301M  6  0.19  0.43
6  29:44.56000 2165 /usr/local/bin/ceph-osd
09:58:48  4935  root 20 1  420 S1G  310M  9  0.18  0.28
4  29:09.78000 2042 /usr/local/bin/ceph-osd
09:58:48  5750  root 20 1  420 S1G  267M  2  0.11  0.14
2  26:55.31000  866 /usr/local/bin/ceph-osd
09:58:48  6544  root 20 1  424 S1G  299M  7  0.22  0.62
8  26:46.35000 3468 /usr/local/bin/ceph-osd
09:58:48  7379  root 20 1  424 S1G  283M  8  0.16  0.47
6  25:47.86000  538 /usr/local/bin/ceph-osd
09:58:48  8183

[ceph-users] Migrate OSDs to different backend

2015-07-29 Thread Kenneth Waegeman

Hi all,

We are considering to migrate all our OSDs of our EC pool from KeyValue 
to Filestore. Does someone has experience with this? What would be a 
good procedure?


We have Erasure Code using k+m: 10+3, with host-level failure domain on 
14 servers. Our pool is 30% filled.


I was thinking:
We set the weight of 1/2 of the OSDS on each host to 0 and let the 
cluster migrate the data

We then remove these OSDS, and re-add them
We do then the same with the other OSDS
(Or we do it in 3 times with 1/3 of the OSDS)

Another option:
We repeatedly;
Remove all KV OSDS of 2 servers (m=3)
Re-add all those OSDS with filestore
Wait for data to rebalance

Does someone know what would be the best way? Are there things we should 
not forget or be careful with ?


Thank you very much!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD RAM usage values

2015-07-28 Thread Kenneth Waegeman



On 07/17/2015 02:50 PM, Gregory Farnum wrote:

On Fri, Jul 17, 2015 at 1:13 PM, Kenneth Waegeman
kenneth.waege...@ugent.be wrote:

Hi all,

I've read in the documentation that OSDs use around 512MB on a healthy
cluster.(http://ceph.com/docs/master/start/hardware-recommendations/#ram)
Now, our OSD's are all using around 2GB of RAM memory while the cluster is
healthy.


   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
29784 root  20   0 6081276 2.535g   4740 S   0.7  8.1   1346:55 ceph-osd
32818 root  20   0 5417212 2.164g  24780 S  16.2  6.9   1238:55 ceph-osd
25053 root  20   0 5386604 2.159g  27864 S   0.7  6.9   1192:08 ceph-osd
33875 root  20   0 5345288 2.092g   3544 S   0.7  6.7   1188:53 ceph-osd
30779 root  20   0 5474832 2.090g  28892 S   1.0  6.7   1142:29 ceph-osd
22068 root  20   0 5191516 2.000g  28664 S   0.7  6.4  31:56.72 ceph-osd
34932 root  20   0 5242656 1.994g   4536 S   0.3  6.4   1144:48 ceph-osd
26883 root  20   0 5178164 1.938g   6164 S   0.3  6.2   1173:01 ceph-osd
31796 root  20   0 5193308 1.916g  27000 S  16.2  6.1 923:14.87 ceph-osd
25958 root  20   0 5193436 1.901g   2900 S   0.7  6.1   1039:53 ceph-osd
27826 root  20   0 5225764 1.845g   5576 S   1.0  5.9   1031:15 ceph-osd
36011 root  20   0 5111660 1.823g  20512 S  15.9  5.8   1093:01 ceph-osd
19736 root  20   0 2134680 0.994g  0 S   0.3  3.2  46:13.47 ceph-osd



[root@osd003 ~]# ceph status
2015-07-17 14:03:13.865063 7f1fde5f0700 -1 WARNING: the following dangerous
and experimental features are enabled: keyvaluestore
2015-07-17 14:03:13.887087 7f1fde5f0700 -1 WARNING: the following dangerous
and experimental features are enabled: keyvaluestore
 cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
  health HEALTH_OK
  monmap e1: 3 mons at
{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
 election epoch 58, quorum 0,1,2 mds01,mds02,mds03
  mdsmap e17218: 1/1/1 up {0=mds03=up:active}, 1 up:standby
  osdmap e25542: 258 osds: 258 up, 258 in
   pgmap v2460163: 4160 pgs, 4 pools, 228 TB data, 154 Mobjects
 270 TB used, 549 TB / 819 TB avail
 4152 active+clean
8 active+clean+scrubbing+deep


We are using erasure code on most of our OSDs, so maybe that is a reason.
But also the cache-pool filestore OSDS on 200GB SSDs are using 2GB of RAM.
Our erasure code pool (16*14 osds) have a pg_num of 2048; our cache pool
(2*14 OSDS) has a pg_num of 1024.

Are these normal values for this configuration, and is the documentation a
bit outdated, or should we look into something else?


2GB of RSS is larger than I would have expected, but not unreasonable.
In particular I don't think we've gathered numbers on either EC pools
or on the effects of the caching processes.


Which data is actually in memory of the OSDS?
Is this mostly cached data?
We are short on memory on these servers, can we have influence on this?

Thanks again!
Kenneth


-Greg


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD RAM usage values

2015-07-17 Thread Kenneth Waegeman

Hi all,

I've read in the documentation that OSDs use around 512MB on a healthy 
cluster.(http://ceph.com/docs/master/start/hardware-recommendations/#ram)
Now, our OSD's are all using around 2GB of RAM memory while the cluster 
is healthy.



  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ 
COMMAND 

29784 root  20   0 6081276 2.535g   4740 S   0.7  8.1   1346:55 
ceph-osd 

32818 root  20   0 5417212 2.164g  24780 S  16.2  6.9   1238:55 
ceph-osd 

25053 root  20   0 5386604 2.159g  27864 S   0.7  6.9   1192:08 
ceph-osd 

33875 root  20   0 5345288 2.092g   3544 S   0.7  6.7   1188:53 
ceph-osd 

30779 root  20   0 5474832 2.090g  28892 S   1.0  6.7   1142:29 
ceph-osd 

22068 root  20   0 5191516 2.000g  28664 S   0.7  6.4  31:56.72 
ceph-osd 

34932 root  20   0 5242656 1.994g   4536 S   0.3  6.4   1144:48 
ceph-osd 

26883 root  20   0 5178164 1.938g   6164 S   0.3  6.2   1173:01 
ceph-osd 

31796 root  20   0 5193308 1.916g  27000 S  16.2  6.1 923:14.87 
ceph-osd 

25958 root  20   0 5193436 1.901g   2900 S   0.7  6.1   1039:53 
ceph-osd 

27826 root  20   0 5225764 1.845g   5576 S   1.0  5.9   1031:15 
ceph-osd 

36011 root  20   0 5111660 1.823g  20512 S  15.9  5.8   1093:01 
ceph-osd 

19736 root  20   0 2134680 0.994g  0 S   0.3  3.2  46:13.47 
ceph-osd




[root@osd003 ~]# ceph status
2015-07-17 14:03:13.865063 7f1fde5f0700 -1 WARNING: the following 
dangerous and experimental features are enabled: keyvaluestore
2015-07-17 14:03:13.887087 7f1fde5f0700 -1 WARNING: the following 
dangerous and experimental features are enabled: keyvaluestore

cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
 health HEALTH_OK
 monmap e1: 3 mons at 
{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}

election epoch 58, quorum 0,1,2 mds01,mds02,mds03
 mdsmap e17218: 1/1/1 up {0=mds03=up:active}, 1 up:standby
 osdmap e25542: 258 osds: 258 up, 258 in
  pgmap v2460163: 4160 pgs, 4 pools, 228 TB data, 154 Mobjects
270 TB used, 549 TB / 819 TB avail
4152 active+clean
   8 active+clean+scrubbing+deep


We are using erasure code on most of our OSDs, so maybe that is a 
reason. But also the cache-pool filestore OSDS on 200GB SSDs are using 
2GB of RAM.
Our erasure code pool (16*14 osds) have a pg_num of 2048; our cache pool 
(2*14 OSDS) has a pg_num of 1024.


Are these normal values for this configuration, and is the documentation 
a bit outdated, or should we look into something else?


Thank you!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] fuse mount in fstab

2015-07-09 Thread Kenneth Waegeman

Hi all,

we are trying to mount ceph-fuse in fstab, following this: 
http://ceph.com/docs/master/cephfs/fstab/


When we add this:

id=cephfs,conf=/etc/ceph/ceph.conf  /mnt/ceph   fuse.ceph 
defaults0 0


to fstab, we get an error message running mount:

mount: can't find id=cephfs,conf=/etc/ceph/ceph.conf

same happens when only using id=cephfs

I've found an old thread also mentioning this, but without solution..
(http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-January/037049.html)

Thanks!
Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] fuse mount in fstab

2015-07-09 Thread Kenneth Waegeman

Hmm, it looks like a version issue..

I am testing with these versions on centos7:
 ~]# mount -V
mount from util-linux 2.23.2 (libmount 2.23.0: selinux, debug, assert)
 ~]# ceph-fuse -v
ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)

This do not work..


On my fedora box, with these versions from repo:
# mount -V
mount from util-linux 2.24.2 (libmount 2.24.0: selinux, debug, assert)
# ceph-fuse -v
ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)

this works..


Which versions are you running?
And does someone knows from which versions , or which version 
combinations do work?


Thanks a lot!
K

On 07/09/2015 11:53 AM, Thomas Lemarchand wrote:

Hello Kenneth,

I have a working ceph fuse in fstab. Only difference I see it that I
don't use conf, your configuration file is at the default path
anyway.

I tried it with and without conf, but it always complains about id


id=recette-files-rw,client_mountpoint=/recette-files/files
  /mnt/wimi/ceph-files  fuse.ceph noatime,_netdev 0 0



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] A cache tier issue with rate only at 20MB/s when data move from cold pool to hot pool

2015-06-16 Thread Kenneth Waegeman

Hi!

We also see this at our site:  When we cat a large file from cephfs to 
/dev/null, we get about 10MB/s data transfer.  I also do not see a 
system resource bottleneck.
Our cluster consists of 14 servers with each 16 disks, together forming 
a EC coded pool. We also have 2SSDs per server for the cache. Running 0.94.1


So we are having the same question.

Our cache pool is in writeback mode, can it help to set it in readonly 
for this?


Kenneth

On 06/16/2015 12:58 PM, liukai wrote:

Hi all,
   A cache tier, 2 hot node with 8 ssd osd, and 2 cold node with 24 sata
osd.

   The public network rate is 1Mb/s and cluster network rate is
1000Mb/s.

   Using fuse-client to access the files.
The issue is:

   When the files are in hot pool, the copy rate is very fash.

   But when the files are only in cold pool, the rate only reach 20MB/s.

   I known that when files are not in the hot pool, the files should
been copy from cold pool to hot pool first, and then from hot pool to
client.

   But the cpu, ram and network seems not the bottleneck. and will be
the cause of system design?

   Are there some params to adjust to improve the rate from cold pool to
hot pool?

Thanks
2015-06-16

liukai


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bursty IO, ceph cache pool can not follow evictions

2015-06-03 Thread Kenneth Waegeman



On 06/02/2015 07:08 PM, Nick Fisk wrote:

Hi Kenneth,

I suggested an idea which may help with this, it is being currently being
developed .

https://github.com/ceph/ceph/pull/4792

In short there is a high and low threshold with different flushing
priorities. Hopefully this will help with bursty workloads.


Thanks! Will this also increase the absolute flushing speed? Because I 
think the problem is more the absolute speed.. It is not my workload 
that is bursty, but the actual processing to the ceph cluster, because 
the cache flushes slower than new data entering.
Now I see my cold storage disks aren't doing a lot of usage (see iostat 
usage other email), so is there a way to increase the flushing speed by 
tuning the cache agent for eg parallelism.. ?




Nick


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Kenneth Waegeman
Sent: 02 June 2015 17:54
To: ceph-users@lists.ceph.com
Subject: [ceph-users] bursty IO, ceph cache pool can not follow evictions

Hi,

we were rsync-streaming with 4 cephfs client to a ceph cluster with a

cache

layer upon an erasure coded pool.
This was going on for some time, and didn't have real problems.

Today we added 2 more streams, and very soon we saw some strange
behaviour:
- We are getting blocked requests on our cache pool osds
- our cache pool is often near/ at max ratio
- Our data streams have very bursty IO, (streaming a minute a few hunderds
MB and then nothing)

Our OSDs are not overloaded (nor the ECs nor cache, checked with iostat),
though it seems like the cache pool can not evict objects in time, and get
blocked until that is ok, each time again.
If I rise the target_max_bytes limit, it starts streaming again until it

is full

again.

cache parameters we have are these:
ceph osd pool set cache hit_set_type bloom ceph osd pool set cache
hit_set_count 1 ceph osd pool set cache hit_set_period 3600 ceph osd pool
set cache target_max_bytes $((14*75*1024*1024*1024)) ceph osd pool set
cache cache_target_dirty_ratio 0.4 ceph osd pool set cache
cache_target_full_ratio 0.8


What can be the issue here ? I tried to find some information about the
'cache agent' , but can only find some old references..

Thank you!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bursty IO, ceph cache pool can not follow evictions

2015-06-03 Thread Kenneth Waegeman



On 06/02/2015 07:21 PM, Paul Evans wrote:

Kenneth,
   My guess is that you’re hitting the cache_target_full_ratio on an
individual OSD, which is easy to do since most of us tend to think of
the cache_target_full_ratio as an aggregate of the OSDs (which it is not
according to Greg Farnum).   This posting may shed more light on the
issue, if it is indeed what you are bumping up against.
https://www.mail-archive.com/ceph-users%40lists.ceph.com/msg20207.html

It looks like this indeed, then the question is why it is not flushing more?


   BTW: how are you determining that your OSDs are ‘not overloaded?’
  Are you judging that by iostat utilization, or by capacity consumed?
iostat is showing low utilisation on all disks; soem disks are doing 
'nothing':



Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdn   0.00 0.00  813.50  415.0016.9015.49 
53.99 0.420.350.150.72   0.10  12.00
sdm   0.00 0.00  820.50  490.5013.0621.99 
54.76 0.700.540.181.13   0.12  15.50
sdq   0.00 1.50   14.00   47.00 0.98 0.33 
43.99 0.558.93   18.935.96   6.31  38.50
sdr   0.00 0.000.000.50 0.00 0.00 
14.00 0.000.000.000.00   0.00   0.00
sdd   0.00 9.504.00   21.50 0.27 1.47 
140.00 0.124.712.505.12   4.31  11.00
sda   0.00 8.502.50   14.50 0.26 0.71 
116.91 0.084.414.004.48   4.71   8.00
sdh   0.00 6.002.00   15.00 0.25 1.10 
162.59 0.073.827.503.33   3.53   6.00
sdf   0.0017.503.00   25.00 0.32 1.01 
97.48 0.238.215.008.60   8.21  23.00
sdi   0.0011.001.00   31.50 0.07 2.23 
144.60 0.144.460.004.60   3.85  12.50
sdo   0.00 0.000.001.00 0.00 0.00 
8.00 0.000.000.000.00   0.00   0.00
sdk   0.00 0.00   22.500.00 1.58 0.00 
143.82 0.135.785.780.00   4.00   9.00
sdg   0.00 2.500.00   30.00 0.00 3.35 
228.52 0.144.500.004.50   1.33   4.00
sdc   0.0012.501.50   23.50 0.01 1.36 
111.68 0.176.800.007.23   6.20  15.50
sdj   0.0018.50   27.50   30.50 2.28 1.65 
138.82 0.437.337.826.89   5.86  34.00
sde   0.00 4.000.50   15.00 0.04 0.10 
18.10 0.074.84   10.004.67   2.58   4.00
sdl   0.0023.006.00   33.00 0.58 1.31 
99.22 0.287.05   17.505.15   6.79  26.50
sdb   0.00 5.003.009.00 0.12 0.47 
100.29 0.054.581.675.56   3.75   4.50


In my opinion there should be enough resources to do flushing, and 
therefore not getting a full cache..



--
*Paul *
*
*
*
*

On Jun 2, 2015, at 9:53 AM, Kenneth Waegeman
kenneth.waege...@ugent.be mailto:kenneth.waege...@ugent.be wrote:

Hi,

we were rsync-streaming with 4 cephfs client to a ceph cluster with a
cache layer upon an erasure coded pool.
This was going on for some time, and didn't have real problems.

Today we added 2 more streams, and very soon we saw some strange
behaviour:
- We are getting blocked requests on our cache pool osds
- our cache pool is often near/ at max ratio
- Our data streams have very bursty IO, (streaming a minute a few
hunderds MB and then nothing)

Our OSDs are not overloaded (nor the ECs nor cache, checked with
iostat), though it seems like the cache pool can not evict objects in
time, and get blocked until that is ok, each time again.
If I rise the target_max_bytes limit, it starts streaming again until
it is full again.

cache parameters we have are these:
ceph osd pool set cache hit_set_type bloom
ceph osd pool set cache hit_set_count 1
ceph osd pool set cache hit_set_period 3600
ceph osd pool set cache target_max_bytes $((14*75*1024*1024*1024))
ceph osd pool set cache cache_target_dirty_ratio 0.4
ceph osd pool set cache cache_target_full_ratio 0.8


What can be the issue here ? I tried to find some information about
the 'cache agent' , but can only find some old references..

Thank you!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] bursty IO, ceph cache pool can not follow evictions

2015-06-02 Thread Kenneth Waegeman

Hi,

we were rsync-streaming with 4 cephfs client to a ceph cluster with a 
cache layer upon an erasure coded pool.

This was going on for some time, and didn't have real problems.

Today we added 2 more streams, and very soon we saw some strange behaviour:
- We are getting blocked requests on our cache pool osds
- our cache pool is often near/ at max ratio
- Our data streams have very bursty IO, (streaming a minute a few 
hunderds MB and then nothing)


Our OSDs are not overloaded (nor the ECs nor cache, checked with 
iostat), though it seems like the cache pool can not evict objects in 
time, and get blocked until that is ok, each time again.
If I rise the target_max_bytes limit, it starts streaming again until it 
is full again.


cache parameters we have are these:
ceph osd pool set cache hit_set_type bloom
ceph osd pool set cache hit_set_count 1
ceph osd pool set cache hit_set_period 3600
ceph osd pool set cache target_max_bytes $((14*75*1024*1024*1024))
ceph osd pool set cache cache_target_dirty_ratio 0.4
ceph osd pool set cache cache_target_full_ratio 0.8


What can be the issue here ? I tried to find some information about the 
'cache agent' , but can only find some old references..


Thank you!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-28 Thread Kenneth Waegeman



On 05/27/2015 10:30 PM, Gregory Farnum wrote:

On Wed, May 27, 2015 at 6:49 AM, Kenneth Waegeman
kenneth.waege...@ugent.be wrote:

We are also running a full backup sync to cephfs, using multiple distributed
rsync streams (with zkrsync), and also ran in this issue today on Hammer
0.94.1  .
After setting the beacon higer, and eventually clearing the journal, it
stabilized again.

We were using ceph-fuse to mount the cephfs, not the ceph kernel client.


What's your MDS cache size set to?
I did set it to 100 before (we have 64G of ram for the mds) trying 
to get rid of the 'Client .. failing to respond to cache pressure' messages

 Did you have any warnings in the

ceph log about clients not releasing caps?
Unfortunately lost the logs of before it happened.. But nothing in the 
new logs about that, I will follow this up


I think you could hit this in ceph-fuse as well on hammer, although we
just merged in a fix: https://github.com/ceph/ceph/pull/4653
-Greg


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-27 Thread Kenneth Waegeman
We are also running a full backup sync to cephfs, using multiple 
distributed rsync streams (with zkrsync), and also ran in this issue 
today on Hammer 0.94.1  .
After setting the beacon higer, and eventually clearing the journal, it 
stabilized again.


We were using ceph-fuse to mount the cephfs, not the ceph kernel client.


On 05/25/2015 11:55 AM, Yan, Zheng wrote:

the kernel client bug should be fixed by
https://github.com/ceph/ceph-client/commit/72f22efb658e6f9e126b2b0fcb065f66ffd02239
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph.conf boolean value for mon_cluster_log_to_syslog

2015-05-27 Thread Kenneth Waegeman



On 05/23/2015 08:26 AM, Abhishek L wrote:


Gregory Farnum writes:


On Thu, May 21, 2015 at 8:24 AM, Kenneth Waegeman
kenneth.waege...@ugent.be wrote:

Hi,

Some strange issue wrt boolean values in the config:

this works:

osd_crush_update_on_start = 0 - osd not updated
osd_crush_update_on_start = 1 - osd updated

In a previous version we could set boolean values in the ceph.conf file with
the integers 1(true) and false(0) also for mon_cluster_log_to_syslog, but
this does not work anymore..:

mon_cluster_log_to_syslog = true
works, but
mon_cluster_log_to_syslog = 1
does not.

Is mon_cluster_log_to_syslog not a real boolean anymore? Or what could this
be?


Looking at src/common/config_opts.h, mon_cluster_log_to_syslog is a
string type now. I presume the code is interpreting it and there are
different options but I don't know when or why it got changed. :)


Git blame tells me that it was introduced in
https://github.com/ceph/ceph/pull/2118, (b97b06) where it was changed
 From bool to string. Though I can't answer the why :)


Should I log an issue for this, or is this intended this way?  :-)




-Greg




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph.conf boolean value for mon_cluster_log_to_syslog

2015-05-21 Thread Kenneth Waegeman

Hi,

Some strange issue wrt boolean values in the config:

this works:

osd_crush_update_on_start = 0 - osd not updated
osd_crush_update_on_start = 1 - osd updated

In a previous version we could set boolean values in the ceph.conf file 
with the integers 1(true) and false(0) also for 
mon_cluster_log_to_syslog, but this does not work anymore..:


mon_cluster_log_to_syslog = true
works, but
mon_cluster_log_to_syslog = 1
does not.

Is mon_cluster_log_to_syslog not a real boolean anymore? Or what could 
this be?


Many thanks!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph tell changed?

2015-05-21 Thread Kenneth Waegeman
/usr/bin/ceph -f json --cluster ceph tell *.mds01 injectargs -- 
--mon_osd_min_down_reports=26
2015-05-21 17:52:14.476099 7f03375e7700 -1 WARNING: the following 
dangerous and experimental features are enabled: keyvaluestore
2015-05-21 17:52:14.497399 7f03375e7700 -1 WARNING: the following 
dangerous and experimental features are enabled: keyvaluestore

error handling command target: unknown type *

Same with other config options, eg. mds_cache_size
Those warnings I always get;-)

Running on 0.94.1

On 05/21/2015 05:36 PM, Loic Dachary wrote:

Hi,

It should work. Could you copy/paste the command you run and its output ?

Cheers

On 21/05/2015 17:34, Kenneth Waegeman wrote:

Hi,

We're using ceph tell in our configuration system since emperor, and before we 
could run 'ceph tell *.$host injectargs --  ...' , and while I'm honestly not 
completely sure anymore this did all what I think it did, it exited cleanly and 
I *suppose* it injected the config in all the daemons of the local host.
Anyway, running the recent version (0.94) , I saw this was not possible anymore 
to use it this way.
Is there a way to inject config only to local daemons ? (tell osd.* takes all 
osds of the cluster)

Thanks again!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cache pool parameters and pressure

2015-05-12 Thread Kenneth Waegeman



On 04/30/2015 07:50 PM, Gregory Farnum wrote:

On Thu, Apr 30, 2015 at 2:03 AM, Kenneth Waegeman
kenneth.waege...@ugent.be wrote:

So the cache is empty, but I get warning when I check the health:
  health HEALTH_WARN
 mds0: Client cephtst.cubone.os failing to respond to cache
pressure

Someone an idea what is happening here?


This means that the MDS sent the client a request to drop some cached
inodes/dentries and it isn't. This could mean that the client is too
old to respond correctly, or that you've actually got so many files
open that the client can't drop any of the requested items.


Can we increase the maximum allowed number of inodes for the client(s) ? 
I thought there was a setting for that, but I don't seem to find it 
anymore..


Thanks!

Kenneth


-Greg


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-fuse options: writeback cache

2015-05-11 Thread Kenneth Waegeman

Hi all,

I have a few questions about ceph-fuse options:
- Is the fuse writeback cache being used? How can we see this? Can it be 
turned on with allow_wbcache somehow?


- What is the default of the big_writes option? (as seen in 
/usr/bin/ceph-fuse  --help) . Where can we see this?
If we run ceph fuse as this: ceph-fuse /mnt/ceph -o 
max_write=$((1024*1024*64)),big_writes

we don't see any of this in the output of mount:
ceph-fuse on /mnt/ceph type fuse.ceph-fuse 
(rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)


Can we see this somewhere else?

Many thanks!!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cache pool parameters and pressure

2015-04-30 Thread Kenneth Waegeman

Hi all,

I have some question related to the caching layer. I am using the latest 
version of ceph: 0.94.1. I created the ceph pool with this options:


ceph osd tier add ecdata cache
ceph osd tier cache-mode cache writeback
ceph osd tier set-overlay ecdata cache
ceph osd pool set cache hit_set_type bloom
ceph osd pool set cache hit_set_count 1
ceph osd pool set cache hit_set_period 3600
ceph osd pool set cache target_max_bytes $((300*1024*1024*1024))

I checked these values were set correctly

I also checked some other parameters:

ceph osd pool get cache cache_target_dirty_ratio
cache_target_dirty_ratio: 0
ceph osd pool get cache cache_target_full_ratio
cache_target_full_ratio: 0
get cache cache_min_flush_age
cache_min_flush_age: 0
ceph osd pool get cache cache_min_evict_age
cache_min_evict_age: 0

My first question: What do these zero values mean? Is this really zero, 
or some default value. I don't find defaults in the documentation.


Now a strange thing is: at first the cache pool was being filled until 
it was near the target_max_bytes value of 300G. But now (about 20hours 
later) I check again and there is constantly only about 2G of data in 
the cache pool, so it flushes everything immediately..


Is this the result of a parameter? Can the strange behaviour of an 
initial filling cache that after it started flushing the first time 
keeps on flushing everything be explained?


So the cache is empty, but I get warning when I check the health:
 health HEALTH_WARN
mds0: Client cephtst.cubone.os failing to respond to cache 
pressure


Someone an idea what is happening here?

Thank you very much!

Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cluster not coming up after reboot

2015-04-27 Thread Kenneth Waegeman



On 04/23/2015 06:58 PM, Craig Lewis wrote:

Yes, unless you've adjusted:
[global]
   mon osd min down reporters = 9
   mon osd min down reports = 12

OSDs talk to the MONs on the public network.  The cluster network is
only used for OSD to OSD communication.

If one OSD node can't talk on that network, the other nodes will tell
the MONs that it's OSDs are down.  And that node will also tell the MONs
that all the other OSDs are down.  Then the OSDs marked down will tell
the MONs that they're not down, and the cycle will repeat.


Thanks for the explanation, that makes sense now! Good to know I should 
set those values:)


I'm somewhat surprised that your cluster eventually stabilized.
The OSDs of that one node were eventually set 'out' of the cluster. I 
guess the osds where down long enough to get marked out? (Or the 
monitors took some action after too many failures?) And then the other 
OSDs could stay up I guess:)



I have 8 OSDs per node.  I set my min down reporters high enough that no
single node can mark another node's OSDs down.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] removing a ceph fs

2015-04-23 Thread Kenneth Waegeman



On 04/22/2015 06:51 PM, Gregory Farnum wrote:

If you look at the ceph --help output you'll find some commands for
removing MDSes from the system.


Yes, this works for all but the last mds..

[root@mds01 ~]# ceph mds rm 35632 mds.mds03
Error EBUSY: cannot remove active mds.mds03 rank 0

I stopped the daemon, checked the process was stopped, even did a 
shutdown of that mds server, I keep getting this message and am unable 
to remove the fs ..


log file has this:

2015-04-23 16:14:05.171450 7fa9fe799700 -1 mds.0.4 *** got signal 
Terminated ***
2015-04-23 16:14:05.171490 7fa9fe799700  1 mds.0.4 suicide.  wanted 
down:dne, now up:active





-Greg
On Wed, Apr 22, 2015 at 6:46 AM Kenneth Waegeman
kenneth.waege...@ugent.be mailto:kenneth.waege...@ugent.be wrote:

forgot to mention I'm running 0.94.1

On 04/22/2015 03:02 PM, Kenneth Waegeman wrote:
  Hi,
 
  I tried to recreate a ceph fs ( well actually an underlying pool, but
  for that I need to first remove the fs) , but this seems not that
easy
  to achieve.
 
  When I run
  `ceph fs rm ceph_fs`
  I get:
  `Error EINVAL: all MDS daemons must be inactive before removing
filesystem`
 
  I stopped the 3 MDSs, but this doesn't change anything, as ceph
health
  still thinks there is an mds running laggy:
 
health HEALTH_WARN
   mds cluster is degraded
   mds mds03 is laggy
monmap e1: 3 mons at ...
   election epoch 12, quorum 0,1,2 mds01,mds02,mds03
mdsmap e12: 1/1/1 up {0=mds03=up:replay(laggy or crashed)}
 
  I checked the mds processes are gone..
 
  Someone knows a solution for this?
 
  Thanks!
  Kenneth
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cluster not coming up after reboot

2015-04-23 Thread Kenneth Waegeman



On 04/22/2015 07:35 PM, Gregory Farnum wrote:

On Wed, Apr 22, 2015 at 8:17 AM, Kenneth Waegeman
kenneth.waege...@ugent.be wrote:

Hi,

I changed the cluster network parameter in the config files, restarted the
monitors , and then restarted all the OSDs (shouldn't have done that).


Do you mean that you changed the IP addresses of the monitors in the
config files everywhere, and then tried to restart things? Or
something else?
I only changed the value of the cluster network to a different one then 
the public network



Now
the OSDS keep on crashing, and the cluster is not able to restore.. I
eventually rebooted the whole cluster, but the problem remains: For a moment
all 280 OSDs are up, and then they start crashing rapidly until there are
only less than 100 left (and eventually 30 or so).


Are the OSDs actually crashing, or are they getting shut down? If
they're crashing, can you please provide the actual backtrace? The
logs you're including below are all fairly low level and generally
don't even mean something has to be wrong.


It seems I did not tested the network throughfully enough, there was one 
host that was unable to connect to the cluster network, only the public 
network. I've found this out after all but the osds of that host came up 
after a few hours. I fixed the network issue and all was fine (only a 
few peering problems, but a restart of those osds blocking was sufficient)
There were no backtraces, and indeed I found out there were some 
shutdown messages in the logs.


So it is all fixed now, but is it explainable that at first about 90% of 
the OSDS going into shutdown over and over, and only after some time got 
in a stable situation, because of one host network failure ?


Thanks again!




In the log files I see different kind of messages: Some OSDs have:
snip
I tested the network, the hosts can reach one another on both networks..


What configurations did you test?
14 hosts with each 16 keyvalue osds , 2 replicated cache partitions and 
metadata partitions on 2 SSDs for cephfs.



-Greg


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] removing a ceph fs

2015-04-22 Thread Kenneth Waegeman

Hi,

I tried to recreate a ceph fs ( well actually an underlying pool, but 
for that I need to first remove the fs) , but this seems not that easy 
to achieve.


When I run
`ceph fs rm ceph_fs`
I get:
`Error EINVAL: all MDS daemons must be inactive before removing filesystem`

I stopped the 3 MDSs, but this doesn't change anything, as ceph health 
still thinks there is an mds running laggy:


 health HEALTH_WARN
mds cluster is degraded
mds mds03 is laggy
 monmap e1: 3 mons at ...
election epoch 12, quorum 0,1,2 mds01,mds02,mds03
 mdsmap e12: 1/1/1 up {0=mds03=up:replay(laggy or crashed)}

I checked the mds processes are gone..

Someone knows a solution for this?

Thanks!
Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] removing a ceph fs

2015-04-22 Thread Kenneth Waegeman

forgot to mention I'm running 0.94.1

On 04/22/2015 03:02 PM, Kenneth Waegeman wrote:

Hi,

I tried to recreate a ceph fs ( well actually an underlying pool, but
for that I need to first remove the fs) , but this seems not that easy
to achieve.

When I run
`ceph fs rm ceph_fs`
I get:
`Error EINVAL: all MDS daemons must be inactive before removing filesystem`

I stopped the 3 MDSs, but this doesn't change anything, as ceph health
still thinks there is an mds running laggy:

  health HEALTH_WARN
 mds cluster is degraded
 mds mds03 is laggy
  monmap e1: 3 mons at ...
 election epoch 12, quorum 0,1,2 mds01,mds02,mds03
  mdsmap e12: 1/1/1 up {0=mds03=up:replay(laggy or crashed)}

I checked the mds processes are gone..

Someone knows a solution for this?

Thanks!
Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cluster not coming up after reboot

2015-04-22 Thread Kenneth Waegeman

Hi,

I changed the cluster network parameter in the config files, restarted 
the monitors , and then restarted all the OSDs (shouldn't have done 
that). Now the OSDS keep on crashing, and the cluster is not able to 
restore.. I eventually rebooted the whole cluster, but the problem 
remains: For a moment all 280 OSDs are up, and then they start crashing 
rapidly until there are only less than 100 left (and eventually 30 or so).


In the log files I see different kind of messages: Some OSDs have:

2015-04-22 17:09:40.019825 7f74a8f70700  0 -- 10.143.16.11:0/4255  
10.141.16.12:6807/2426 pipe(0x54f2000 sd=68 :44692 s=1 pgs=0 cs=0 l=1 
c=0x55c8dc0).connect claims to be 10.141.16.12:6807/1004858 not 
10.141.16.12:6807/2426 - wrong node!
2015-04-22 17:09:40.019827 7f74a694a700  0 -- 10.143.16.11:0/4255  
10.143.16.12:6801/2146 pipe(0x5719000 sd=57 :56935 s=1 pgs=0 cs=0 l=1 
c=0x55ce9e0).connect claims to be 10.143.16.12:6801/1005047 not 
10.143.16.12:6801/2146 - wrong node!
2015-04-22 17:09:40.019867 7f74a9f80700  0 -- 10.143.16.11:0/4255  
10.143.16.12:6803/2228 pipe(0x5722000 sd=60 :36208 s=1 pgs=0 cs=0 l=1 
c=0x55cf640).connect claims to be 10.143.16.12:6803/1005739 not 
10.143.16.12:6803/2228 - wrong node!


Others have:

2015-04-22 17:04:52.125096 7fe99e84e700  0 -- 10.143.16.11:6824/3871  
10.143.16.11:6828/4255 pipe(0x60
4c800 sd=30 :6824 s=2 pgs=14 cs=1 l=0 c=0x5ae27e0).fault with nothing to 
send, going to standby
2015-04-22 17:04:52.126353 7fe98c9ed700  0 -- 10.143.16.11:0/3871  
10.141.16.11:6829/4255 pipe(0x653d8

00 sd=28 :0 s=1 pgs=0 cs=0 l=1 c=0x65c2d60).fault
2015-04-22 17:04:52.126363 7fe990225700  0 -- 10.143.16.11:0/3871  
10.143.16.11:6829/4255 pipe(0x63258

00 sd=21 :0 s=1 pgs=0 cs=0 l=1 c=0x65c3440).fault
2015-04-22 17:04:52.128847 7fe98fb1e700  0 -- 10.143.16.11:6824/3871  
10.143.16.17:6840/1004452 pipe(0
x6518000 sd=62 :6824 s=2 pgs=67 cs=1 l=0 c=0x65da7e0).fault with nothing 
to send, going to standby
2015-04-22 17:05:01.610056 7fe98c8ec700  0 -- 10.143.16.11:0/3871  
10.141.16.11:6823/3641 pipe(0x65420

00 sd=61 :0 s=1 pgs=0 cs=0 l=1 c=0x65c7380).fault
2015-04-22 17:05:01.616051 7fe990a2d700  0 -- 10.143.16.11:0/3871  
10.143.16.11:6823/3641 pipe(0x579c8

00 sd=63 :0 s=1 pgs=0 cs=0 l=1 c=0x65c1a20).fault
2015-04-22 17:05:01.646500 7fe9a515c700  0 log_channel(cluster) log 
[WRN] : map e1993 wrongly marked me

down

I tested the network, the hosts can reach one another on both networks..

Is this somehow fixable?

Many thanks!
Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] where to download 0.87 RPMS?

2014-10-31 Thread Kenneth Waegeman



Thanks. It would be nice though to have a repo where all the packages  
are. We lock our packages ourselves, so we would just need to bump the  
version instead of adding a repo for each major version:)



- Message from Irek Fasikhov malm...@gmail.com -
   Date: Thu, 30 Oct 2014 13:37:34 +0400
   From: Irek Fasikhov malm...@gmail.com
Subject: Re: [ceph-users] where to download 0.87 RPMS?
 To: Kenneth Waegeman kenneth.waege...@ugent.be
 Cc: Patrick McGarry patr...@inktank.com, ceph-users  
ceph-users@lists.ceph.com




Hi.

Use http://ceph.com/rpm-giant/

2014-10-30 12:34 GMT+03:00 Kenneth Waegeman kenneth.waege...@ugent.be:


Hi,

Will http://ceph.com/rpm/ also be updated to have the giant packages?

Thanks

Kenneth




- Message from Patrick McGarry patr...@inktank.com -
   Date: Wed, 29 Oct 2014 22:13:50 -0400
   From: Patrick McGarry patr...@inktank.com
Subject: Re: [ceph-users] where to download 0.87 RPMS?
 To: 廖建锋 de...@f-club.cn
 Cc: ceph-users ceph-users@lists.ceph.com



 I have updated the http://ceph.com/get page to reflect a more generic

approach to linking.  It's also worth noting that the new
http://download.ceph.com/ infrastructure is available now.

To get to the rpms specifically you can either crawl the
download.ceph.com tree or use the symlink at
http://ceph.com/rpm-giant/

Hope that (and the updated linkage on ceph.com/get) helps.  Thanks!


Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph


On Wed, Oct 29, 2014 at 9:15 PM, 廖建锋 de...@f-club.cn wrote:





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




- End message from Patrick McGarry patr...@inktank.com -

--

Met vriendelijke groeten,
Kenneth Waegeman


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





--
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757



- End message from Irek Fasikhov malm...@gmail.com -

--

Met vriendelijke groeten,
Kenneth Waegeman


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] where to download 0.87 RPMS?

2014-10-30 Thread Kenneth Waegeman

Hi,

Will http://ceph.com/rpm/ also be updated to have the giant packages?

Thanks

Kenneth




- Message from Patrick McGarry patr...@inktank.com -
   Date: Wed, 29 Oct 2014 22:13:50 -0400
   From: Patrick McGarry patr...@inktank.com
Subject: Re: [ceph-users] where to download 0.87 RPMS?
 To: 廖建锋 de...@f-club.cn
 Cc: ceph-users ceph-users@lists.ceph.com



I have updated the http://ceph.com/get page to reflect a more generic
approach to linking.  It's also worth noting that the new
http://download.ceph.com/ infrastructure is available now.

To get to the rpms specifically you can either crawl the
download.ceph.com tree or use the symlink at
http://ceph.com/rpm-giant/

Hope that (and the updated linkage on ceph.com/get) helps.  Thanks!


Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph


On Wed, Oct 29, 2014 at 9:15 PM, 廖建锋 de...@f-club.cn wrote:




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



- End message from Patrick McGarry patr...@inktank.com -

--

Met vriendelijke groeten,
Kenneth Waegeman

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] use ZFS for OSDs

2014-10-29 Thread Kenneth Waegeman

Hi,

We are looking to use ZFS for our OSD backend, but I have some questions.

My main question is: Does Ceph already supports the writeparallel mode  
for ZFS ? (as described here:  
http://www.sebastien-han.fr/blog/2013/12/02/ceph-performance-interesting-things-going-on/)
I've found this, but I suppose it is outdated:  
https://wiki.ceph.com/Planning/Blueprints/Emperor/osd%3A_ceph_on_zfs


Should Ceph be build with ZFS support? I found a --with-zfslib option  
somewhere, but can someone verify this, or better has instructions for  
it?:-)


What parameters should be tuned to use this?
I found these :
filestore zfs_snap = 1
journal_aio = 0
journal_dio = 0

Are there other things we need for it?

Many thanks!!
Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD troubles on FS+Tiering

2014-09-16 Thread Kenneth Waegeman


- Message from Gregory Farnum g...@inktank.com -
   Date: Mon, 15 Sep 2014 10:37:07 -0700
   From: Gregory Farnum g...@inktank.com
Subject: Re: [ceph-users] OSD troubles on FS+Tiering
 To: Kenneth Waegeman kenneth.waege...@ugent.be
 Cc: ceph-users ceph-users@lists.ceph.com



The pidfile bug is already fixed in master/giant branches.

As for the crashing, I'd try killing all the osd processes and turning
them back on again. It might just be some daemon restart failed, or
your cluster could be sufficiently overloaded that the node disks are
going unresponsive and they're suiciding, or...


I restarted them that way, and they eventually got clean again.
'ceph status' printed that 'ecdata' pool had too few pgs, so I changed  
the amount of pgs from 128 to 256 (with EC k+m=11)

After a few minutes I checked the cluster state again:

[root@ceph001 ~]# ceph status
cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
 health HEALTH_WARN 100 pgs down; 155 pgs peering; 81 pgs stale;  
240 pgs stuck inactive; 81 pgs stuck stale; 240 pgs stuck unclean; 746  
requests are blocked  32 sec; 'cache' at/near target max; pool ecdata  
pg_num 256  pgp_num 128
 monmap e1: 3 mons at  
{ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, election epoch 8, quorum 0,1,2  
ceph001,ceph002,ceph003

 mdsmap e6993: 1/1/1 up {0=ceph003=up:active}, 2 up:standby
 osdmap e11023: 48 osds: 14 up, 14 in
  pgmap v160466: 1472 pgs, 4 pools, 3899 GB data, 2374 kobjects
624 GB used, 7615 GB / 8240 GB avail
  75 creating
1215 active+clean
 100 down+peering
   1 active+clean+scrubbing
  10 stale
  16 stale+active+clean

Again 34 OSDS are down.. This time I have the error log, I checked a  
few osd logs :


I checked the first host that was marked down:

   -17 2014-09-16 13:27:49.962938 7f5dfe6a3700  5 osd.7 pg_epoch:  
8912 pg[2.b0s3(unlocked)] enter Initial
   -16 2014-09-16 13:27:50.008842 7f5e02eac700  1 --  
10.143.8.180:6833/53810 == osd.30 10.141.8.181:0/37396 2524   
osd_ping(ping e8912 stamp 2014-09-16 13:27:50.008514) v2  47+0+0  
(386299 0 0) 0x18ef7080 con 0x6961600
   -15 2014-09-16 13:27:50.008892 7f5e02eac700  1 --  
10.143.8.180:6833/53810 -- 10.141.8.181:0/37396 --  
osd_ping(ping_reply e8912 stamp 2014-09-16 13:27:50.008514) v2 -- ?+0  
0x7326900 con 0x6961600
   -14 2014-09-16 13:27:50.009159 7f5e046af700  1 --  
10.141.8.180:6847/53810 == osd.30 10.141.8.181:0/37396 2524   
osd_ping(ping e8912 stamp 2014-09-16 13:27:50.008514) v2  47+0+0  
(386299 0 0) 0x2210a760 con 0xadd0420
   -13 2014-09-16 13:27:50.009202 7f5e046af700  1 --  
10.141.8.180:6847/53810 -- 10.141.8.181:0/37396 --  
osd_ping(ping_reply e8912 stamp 2014-09-16 13:27:50.008514) v2 -- ?+0  
0x14e35a00 con 0xadd0420
   -12 2014-09-16 13:27:50.034378 7f5dfeea4700  5 osd.7 pg_epoch:  
8912 pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813  
n=16075 ec=104 les/c 813/815 805/8912/791)  
[24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 pi=104-8911/54  
crt=8864'33359 inactive NOTIFY] exit Reset 0.127612 1 0.000123
   -11 2014-09-16 13:27:50.034432 7f5dfeea4700  5 osd.7 pg_epoch:  
8912 pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813  
n=16075 ec=104 les/c 813/815 805/8912/791)  
[24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 pi=104-8911/54  
crt=8864'33359 inactive NOTIFY] enter Started
   -10 2014-09-16 13:27:50.034452 7f5dfeea4700  5 osd.7 pg_epoch:  
8912 pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813  
n=16075 ec=104 les/c 813/815 805/8912/791)  
[24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 pi=104-8911/54  
crt=8864'33359 inactive NOTIFY] enter Start
-9 2014-09-16 13:27:50.034469 7f5dfeea4700  1 osd.7 pg_epoch:  
8912 pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813  
n=16075 ec=104 les/c 813/815 805/8912/791)  
[24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 pi=104-8911/54  
crt=8864'33359 inactive NOTIFY] stateStart: transitioning to Stray
-8 2014-09-16 13:27:50.034491 7f5dfeea4700  5 osd.7 pg_epoch:  
8912 pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813  
n=16075 ec=104 les/c 813/815 805/8912/791)  
[24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 pi=104-8911/54  
crt=8864'33359 inactive NOTIFY] exit Start 0.38 0 0.00
-7 2014-09-16 13:27:50.034521 7f5dfeea4700  5 osd.7 pg_epoch:  
8912 pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813  
n=16075 ec=104 les/c 813/815 805/8912/791)  
[24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 pi=104-8911/54  
crt=8864'33359 inactive NOTIFY] enter Started/Stray
-6 2014-09-16 13:27:50.034664 7f5dfeea4700  5 osd.7 pg_epoch:  
8912 pg[2.7s10( v 8890'35265 (374'32264,8890'35265] local-les=816  
n=32002 ec=104 les/c 816/818 805/814/730)  
[6,30,22,13,39,15,12,5,11,42,7] r=10 lpr=814 pi=104-813/36 luod=0'0  
crt=8885'35261 active] exit Started

[ceph-users] OSDs crashing on CephFS and Tiering

2014-09-15 Thread Kenneth Waegeman


Hi,

I have some strange OSD problems. Before the weekend I started some  
rsync tests over CephFS, on a cache pool with underlying EC KV pool.  
Today the cluster is completely degraded:


[root@ceph003 ~]# ceph status
 cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
  health HEALTH_WARN 19 pgs backfill_toofull; 403 pgs degraded;  
168 pgs down; 8 pgs incomplete; 168 pgs peering; 61 pgs stale; 403 pgs  
stuck degraded; 176 pgs stuck inactive; 61 pgs stuck stale; 589 pgs  
stuck unclean; 403 pgs stuck undersized; 403 pgs undersized; 300  
requests are blocked  32 sec; recovery 15170/27902361 objects  
degraded (0.054%); 1922/27902361 objects misplaced (0.007%); 1 near  
full osd(s)
  monmap e1: 3 mons at  
{ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, election epoch 8, quorum 0,1,2  
ceph001,ceph002,ceph003

  mdsmap e5: 1/1/1 up {0=ceph003=up:active}, 2 up:standby
  osdmap e719: 48 osds: 18 up, 18 in
   pgmap v144887: 1344 pgs, 4 pools, 4139 GB data, 2624 kobjects
 2282 GB used, 31397 GB / 33680 GB avail
 15170/27902361 objects degraded (0.054%); 1922/27902361  
objects misplaced (0.007%)

   68 down+remapped+peering
1 active
  754 active+clean
1 stale+incomplete
1 stale+active+clean+scrubbing
   14 active+undersized+degraded+remapped
7 incomplete
  100 down+peering
9 active+remapped
   59 stale+active+undersized+degraded
   19 active+undersized+degraded+remapped+backfill_toofull
  311 active+undersized+degraded

I tried to figure out what happened in the global logs:

2014-09-13 08:01:19.433313 mon.0 10.141.8.180:6789/0 66076 : [INF]  
pgmap v65892: 1344 pgs: 1344 active+clean; 2606 GB data, 3116 GB used,  
126 TB / 129 TB avail; 4159 kB/s wr, 45 op/s
2014-09-13 08:01:20.443019 mon.0 10.141.8.180:6789/0 66078 : [INF]  
pgmap v65893: 1344 pgs: 1344
2014-09-13 08:01:20.443019 mon.0 10.141.8.180:6789/0 66078 : [INF]  
pgmap v65893: 1344 pgs: 1344 active+clean; 2606 GB data, 3116 GB used,  
126 TB / 129 TB avail; 561 kB/s wr, 11 op/s
2014-09-13 08:01:20.777988 mon.0 10.141.8.180:6789/0 66081 : [INF]  
osd.19 10.141.8.181:6809/29664 failed (3 reports from 3 peers after  
20.79 = grace 20.00)
2014-09-13 08:01:21.455887 mon.0 10.141.8.180:6789/0 66083 : [INF]  
osdmap e117: 48 osds: 47 up, 48 in
2014-09-13 08:01:21.462084 mon.0 10.141.8.180:6789/0 66084 : [INF]  
pgmap v65894: 1344 pgs: 1344 active+clean; 2606 GB data, 3116 GB used,  
126 TB / 129 TB avail; 1353 kB/s wr, 13 op/s
2014-09-13 08:01:21.477007 mon.0 10.141.8.180:6789/0 66085 : [INF]  
pgmap v65895: 1344 pgs: 187 stale+active+clean, 1157 active+clean;  
2606 GB data, 3116 GB used, 126 TB / 129 TB avail; 2300 kB/s wr, 21 op/s
2014-09-13 08:01:22.456055 mon.0 10.141.8.180:6789/0 66086 : [INF]  
osdmap e118: 48 osds: 47 up, 48 in
2014-09-13 08:01:22.462590 mon.0 10.141.8.180:6789/0 66087 : [INF]  
pgmap v65896: 1344 pgs: 187 stale+active+clean, 1157 active+clean;  
2606 GB data, 3116 GB used, 126 TB / 129 TB avail; 13686 kB/s wr, 5 op/s
2014-09-13 08:01:23.464302 mon.0 10.141.8.180:6789/0 66088 : [INF]  
pgmap v65897: 1344 pgs: 187 stale+active+clean, 1157 active+clean;  
2606 GB data, 3116 GB used, 126 TB / 129 TB avail; 11075 kB/s wr, 4 op/s
2014-09-13 08:01:24.477467 mon.0 10.141.8.180:6789/0 66089 : [INF]  
pgmap v65898: 1344 pgs: 187 stale+active+clean, 1157 active+clean;  
2606 GB data, 3116 GB used, 126 TB / 129 TB avail; 4932 kB/s wr, 38 op/s
2014-09-13 08:01:25.481027 mon.0 10.141.8.180:6789/0 66090 : [INF]  
pgmap v65899: 1344 pgs: 187 stale+active+clean, 1157 active+clean;  
2606 GB data, 3116 GB used, 126 TB / 129 TB avail; 5726 kB/s wr, 64 op/s
2014-09-13 08:01:19.336173 osd.1 10.141.8.180:6803/26712 54442 : [WRN]  
1 slow requests, 1 included below; oldest blocked for  30.000137 secs
2014-09-13 08:01:19.336341 osd.1 10.141.8.180:6803/26712 54443 : [WRN]  
slow request 30.000137 seconds old, received at 2014-09-13  
08:00:49.335339: osd_op(client.7448.1:17751783 1203eac.000e  
[write 0~319488 [1@-1],startsync 0~0] 1.b

6c3a3a9 snapc 1=[] ondisk+write e116) currently reached pg
2014-09-13 08:01:20.337602 osd.1 10.141.8.180:6803/26712 5 : [WRN]  
7 slow requests, 6 included below; oldest blocked for  31.001947 secs
2014-09-13 08:01:20.337688 osd.1 10.141.8.180:6803/26712 54445 : [WRN]  
slow request 30.998110 seconds old, received at 2014-09-13  
08:00:49.339176: osd_op(client.7448.1:17751787 1203eac.000e  
[write 319488~65536 [1@-1],startsync 0~0]



This is happening OSD after OSD..

I tried to check the individual log of the osds, but all the  
individual logs stop abruptly (also from the osds that are still  
running):


2014-09-12 14:25:51.205276 7f3517209700  0 log [WRN] : 41 slow  
requests, 1 included below; 

[ceph-users] OSD troubles on FS+Tiering

2014-09-15 Thread Kenneth Waegeman

Hi,

I have some strange OSD problems. Before the weekend I started some  
rsync tests over CephFS, on a cache pool with underlying EC KV pool.  
Today the cluster is completely degraded:


[root@ceph003 ~]# ceph status
cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
 health HEALTH_WARN 19 pgs backfill_toofull; 403 pgs degraded;  
168 pgs down; 8 pgs incomplete; 168 pgs peering; 61 pgs stale; 403 pgs  
stuck degraded; 176 pgs stuck inactive; 61 pgs stuck stale; 589 pgs  
stuck unclean; 403 pgs stuck undersized; 403 pgs undersized; 300  
requests are blocked  32 sec; recovery 15170/27902361 objects  
degraded (0.054%); 1922/27902361 objects misplaced (0.007%); 1 near  
full osd(s)
 monmap e1: 3 mons at  
{ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, election epoch 8, quorum 0,1,2  
ceph001,ceph002,ceph003

 mdsmap e5: 1/1/1 up {0=ceph003=up:active}, 2 up:standby
 osdmap e719: 48 osds: 18 up, 18 in
  pgmap v144887: 1344 pgs, 4 pools, 4139 GB data, 2624 kobjects
2282 GB used, 31397 GB / 33680 GB avail
15170/27902361 objects degraded (0.054%); 1922/27902361  
objects misplaced (0.007%)

  68 down+remapped+peering
   1 active
 754 active+clean
   1 stale+incomplete
   1 stale+active+clean+scrubbing
  14 active+undersized+degraded+remapped
   7 incomplete
 100 down+peering
   9 active+remapped
  59 stale+active+undersized+degraded
  19 active+undersized+degraded+remapped+backfill_toofull
 311 active+undersized+degraded

I tried to figure out what happened in the global logs:

2014-09-13 08:01:19.433313 mon.0 10.141.8.180:6789/0 66076 : [INF]  
pgmap v65892: 1344 pgs: 1344 active+clean; 2606 GB data, 3116 GB used,  
126 TB / 129 TB avail; 4159 kB/s wr, 45 op/s
2014-09-13 08:01:20.443019 mon.0 10.141.8.180:6789/0 66078 : [INF]  
pgmap v65893: 1344 pgs: 1344
2014-09-13 08:01:20.443019 mon.0 10.141.8.180:6789/0 66078 : [INF]  
pgmap v65893: 1344 pgs: 1344 active+clean; 2606 GB data, 3116 GB used,  
126 TB / 129 TB avail; 561 kB/s wr, 11 op/s
2014-09-13 08:01:20.777988 mon.0 10.141.8.180:6789/0 66081 : [INF]  
osd.19 10.141.8.181:6809/29664 failed (3 reports from 3 peers after  
20.79 = grace 20.00)
2014-09-13 08:01:21.455887 mon.0 10.141.8.180:6789/0 66083 : [INF]  
osdmap e117: 48 osds: 47 up, 48 in
2014-09-13 08:01:21.462084 mon.0 10.141.8.180:6789/0 66084 : [INF]  
pgmap v65894: 1344 pgs: 1344 active+clean; 2606 GB data, 3116 GB used,  
126 TB / 129 TB avail; 1353 kB/s wr, 13 op/s
2014-09-13 08:01:21.477007 mon.0 10.141.8.180:6789/0 66085 : [INF]  
pgmap v65895: 1344 pgs: 187 stale+active+clean, 1157 active+clean;  
2606 GB data, 3116 GB used, 126 TB / 129 TB avail; 2300 kB/s wr, 21 op/s
2014-09-13 08:01:22.456055 mon.0 10.141.8.180:6789/0 66086 : [INF]  
osdmap e118: 48 osds: 47 up, 48 in
2014-09-13 08:01:22.462590 mon.0 10.141.8.180:6789/0 66087 : [INF]  
pgmap v65896: 1344 pgs: 187 stale+active+clean, 1157 active+clean;  
2606 GB data, 3116 GB used, 126 TB / 129 TB avail; 13686 kB/s wr, 5 op/s
2014-09-13 08:01:23.464302 mon.0 10.141.8.180:6789/0 66088 : [INF]  
pgmap v65897: 1344 pgs: 187 stale+active+clean, 1157 active+clean;  
2606 GB data, 3116 GB used, 126 TB / 129 TB avail; 11075 kB/s wr, 4 op/s
2014-09-13 08:01:24.477467 mon.0 10.141.8.180:6789/0 66089 : [INF]  
pgmap v65898: 1344 pgs: 187 stale+active+clean, 1157 active+clean;  
2606 GB data, 3116 GB used, 126 TB / 129 TB avail; 4932 kB/s wr, 38 op/s
2014-09-13 08:01:25.481027 mon.0 10.141.8.180:6789/0 66090 : [INF]  
pgmap v65899: 1344 pgs: 187 stale+active+clean, 1157 active+clean;  
2606 GB data, 3116 GB used, 126 TB / 129 TB avail; 5726 kB/s wr, 64 op/s
2014-09-13 08:01:19.336173 osd.1 10.141.8.180:6803/26712 54442 : [WRN]  
1 slow requests, 1 included below; oldest blocked for  30.000137 secs
2014-09-13 08:01:19.336341 osd.1 10.141.8.180:6803/26712 54443 : [WRN]  
slow request 30.000137 seconds old, received at 2014-09-13  
08:00:49.335339: osd_op(client.7448.1:17751783 1203eac.000e  
[write 0~319488 [1@-1],startsync 0~0] 1.b

6c3a3a9 snapc 1=[] ondisk+write e116) currently reached pg
2014-09-13 08:01:20.337602 osd.1 10.141.8.180:6803/26712 5 : [WRN]  
7 slow requests, 6 included below; oldest blocked for  31.001947 secs
2014-09-13 08:01:20.337688 osd.1 10.141.8.180:6803/26712 54445 : [WRN]  
slow request 30.998110 seconds old, received at 2014-09-13  
08:00:49.339176: osd_op(client.7448.1:17751787 1203eac.000e  
[write 319488~65536 [1@-1],startsync 0~0]



This is happening OSD after OSD..

I tried to check the individual log of the osds, but all the  
individual logs stop abruptly (also from the osds that are still  
running):


2014-09-12 14:25:51.205276 7f3517209700  0 log [WRN] : 41 slow  
requests, 1 included below; oldest blocked for  

Re: [ceph-users] Cephfs upon Tiering

2014-09-12 Thread Kenneth Waegeman


- Message from Sage Weil sw...@redhat.com -
   Date: Thu, 11 Sep 2014 14:10:46 -0700 (PDT)
   From: Sage Weil sw...@redhat.com
Subject: Re: [ceph-users] Cephfs upon Tiering
 To: Gregory Farnum g...@inktank.com
 Cc: Kenneth Waegeman kenneth.waege...@ugent.be, ceph-users  
ceph-users@lists.ceph.com




On Thu, 11 Sep 2014, Gregory Farnum wrote:

On Thu, Sep 11, 2014 at 11:39 AM, Sage Weil sw...@redhat.com wrote:
 On Thu, 11 Sep 2014, Gregory Farnum wrote:
 On Thu, Sep 11, 2014 at 4:13 AM, Kenneth Waegeman
 kenneth.waege...@ugent.be wrote:
  Hi all,
 
  I am testing the tiering functionality with cephfs. I used a replicated
  cache with an EC data pool, and a replicated metadata pool like this:
 
 
  ceph osd pool create cache 1024 1024
  ceph osd pool set cache size 2
  ceph osd pool set cache min_size 1
  ceph osd erasure-code-profile set profile11 k=8 m=3
  ruleset-failure-domain=osd
  ceph osd pool create ecdata 128 128 erasure profile11
  ceph osd tier add ecdata cache
  ceph osd tier cache-mode cache writeback
  ceph osd tier set-overlay ecdata cache
  ceph osd pool set cache hit_set_type bloom
  ceph osd pool set cache hit_set_count 1
  ceph osd pool set cache hit_set_period 3600
  ceph osd pool set cache target_max_bytes $((280*1024*1024*1024))
  ceph osd pool create metadata 128 128
  ceph osd pool set metadata crush_ruleset 1 # SSD root in crushmap
  ceph fs new ceph_fs metadata cache  -- wrong ?
 
  I started testing with this, and this worked, I could write to it with
  cephfs and the cache was flushing to the ecdata pool as expected.
  But now I notice I made the fs right upon the cache, instead of the
  underlying data pool. I suppose I should have done this:
 
  ceph fs new ceph_fs metadata ecdata
 
  So my question is: Was this wrong and not doing the things I  
thought it did,
  or was this somehow handled by ceph and didn't it matter I  
specified the

  cache instead of the data pool?

 Well, it's sort of doing what you want it to. You've told the
 filesystem to use the cache pool as the location for all of its
 data. But RADOS is pushing everything in the cache pool down to the
 ecdata pool.
 So it'll work for now as you want. But if in future you wanted to stop
 using the caching pool, or switch it out for a different pool
 entirely, that wouldn't work (whereas it would if the fs was using
 ecdata).


After this I tried with the 'ecdata' pool, which is not working  
because itself is an EC pool.
So I guess specifying the cache pool is then indeed the only way, but  
that's ok then if that works.

It is just a bit confusing to specify the cache pool rather than the data:)



 We should perhaps look at prevent use of cache pools like this...hrm...
 http://tracker.ceph.com/issues/9435

 Should we?  I was planning on doing exactly this for my home cluster.

Not cache pools under CephFS, but specifying the cache pool as the
data pool (rather than some underlying pool). Or is there some reason
we might want the cache pool to be the one the filesystem is using for
indexing?


Oh, right.  Yeah that's fine.  :)

sage




- End message from Sage Weil sw...@redhat.com -

--

Met vriendelijke groeten,
Kenneth Waegeman

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cephfs upon Tiering

2014-09-11 Thread Kenneth Waegeman

Hi all,

I am testing the tiering functionality with cephfs. I used a  
replicated cache with an EC data pool, and a replicated metadata pool  
like this:



ceph osd pool create cache 1024 1024
ceph osd pool set cache size 2
ceph osd pool set cache min_size 1
ceph osd erasure-code-profile set profile11 k=8 m=3 ruleset-failure-domain=osd
ceph osd pool create ecdata 128 128 erasure profile11
ceph osd tier add ecdata cache
ceph osd tier cache-mode cache writeback
ceph osd tier set-overlay ecdata cache
ceph osd pool set cache hit_set_type bloom
ceph osd pool set cache hit_set_count 1
ceph osd pool set cache hit_set_period 3600
ceph osd pool set cache target_max_bytes $((280*1024*1024*1024))
ceph osd pool create metadata 128 128
ceph osd pool set metadata crush_ruleset 1 # SSD root in crushmap
ceph fs new ceph_fs metadata cache  -- wrong ?

I started testing with this, and this worked, I could write to it with  
cephfs and the cache was flushing to the ecdata pool as expected.
But now I notice I made the fs right upon the cache, instead of the  
underlying data pool. I suppose I should have done this:


ceph fs new ceph_fs metadata ecdata

So my question is: Was this wrong and not doing the things I thought  
it did, or was this somehow handled by ceph and didn't it matter I  
specified the cache instead of the data pool?



Thank you!

Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph cluster inconsistency keyvaluestore

2014-09-08 Thread Kenneth Waegeman


Thank you very much !

Is this problem then related to the weird sizes I see:
  pgmap v55220: 1216 pgs, 3 pools, 3406 GB data, 852 kobjects
418 GB used, 88130 GB / 88549 GB avail

a calculation with df shows indeed that there is about 400GB used on  
disks, but the tests I ran should indeed have generated 3,5 TB, as  
also seen in rados df:


pool name   category KB  objects   clones   
   degraded  unfound   rdrd KB   wr
 wr KB
cache   -   59150443154660  
   0   0  1388365   5686734850  3665984
4709621763
ecdata  - 3512807425   8576200  
   0   0  1109938312332288   857621
3512807426


I thought it was related to the inconsistency?
Or can this be a sparse objects thing? (But I don't seem to found  
anything in the docs about that)


Thanks again!

Kenneth



- Message from Haomai Wang haomaiw...@gmail.com -
   Date: Sun, 7 Sep 2014 20:34:39 +0800
   From: Haomai Wang haomaiw...@gmail.com
Subject: Re: ceph cluster inconsistency keyvaluestore
 To: Kenneth Waegeman kenneth.waege...@ugent.be
 Cc: ceph-users@lists.ceph.com



I have found the root cause. It's a bug.

When chunky scrub happen, it will iterate the who pg's objects and
each iterator only a few objects will be scan.

osd/PG.cc:3758
ret = get_pgbackend()- objects_list_partial(
  start,
  cct-_conf-osd_scrub_chunk_min,
  cct-_conf-osd_scrub_chunk_max,
  0,
  objects,
  candidate_end);

candidate_end is the end of object set and it's used to indicate the
next scrub process's start position. But it will be truncated:

osd/PG.cc:3777
while (!boundary_found  objects.size()  1) {
  hobject_t end = objects.back().get_boundary();
  objects.pop_back();

  if (objects.back().get_filestore_key() !=
end.get_filestore_key()) {
candidate_end = end;
boundary_found = true;
  }
}
end which only contain hash field as hobject_t will be assign to
candidate_end.  So the next scrub process a hobject_t only contains
hash field will be passed in to get_pgbackend()-
objects_list_partial.

It will cause incorrect results for KeyValueStore backend. Because it
will use strict key ordering for collection_list_paritial method. A
hobject_t only contains hash field will be:

1%e79s0_head!972F1B5D!!none!!!!0!0

and the actual object is
1%e79s0_head!972F1B5D!!1!!!object-name!head

In other word, a object only contain hash field can't used by to
search a absolute object has the same hash field.

@sage The simply way is modify obj-key function which will change
storage format. Because it's a experiment backend I would like to
provide with a external format change program help users do it. Is it
OK?


On Wed, Sep 3, 2014 at 9:16 PM, Kenneth Waegeman
kenneth.waege...@ugent.be wrote:

I also can reproduce it on a new slightly different set up (also EC on KV
and Cache) by running ceph pg scrub on a KV pg: this pg will then get the
'inconsistent' status



- Message from Kenneth Waegeman kenneth.waege...@ugent.be -
   Date: Mon, 01 Sep 2014 16:28:31 +0200
   From: Kenneth Waegeman kenneth.waege...@ugent.be
Subject: Re: ceph cluster inconsistency keyvaluestore
 To: Haomai Wang haomaiw...@gmail.com
 Cc: ceph-users@lists.ceph.com




Hi,


The cluster got installed with quattor, which uses ceph-deploy for
installation of daemons, writes the config file and installs the crushmap.
I have 3 hosts, each 12 disks, having a large KV partition (3.6T) for the
ECdata pool and a small cache partition (50G) for the cache

I manually did this:

ceph osd pool create cache 1024 1024
ceph osd pool set cache size 2
ceph osd pool set cache min_size 1
ceph osd erasure-code-profile set profile11 k=8 m=3
ruleset-failure-domain=osd
ceph osd pool create ecdata 128 128 erasure profile11
ceph osd tier add ecdata cache
ceph osd tier cache-mode cache writeback
ceph osd tier set-overlay ecdata cache
ceph osd pool set cache hit_set_type bloom
ceph osd pool set cache hit_set_count 1
ceph osd pool set cache hit_set_period 3600
ceph osd pool set cache target_max_bytes $((280*1024*1024*1024))

(But the previous time I had the problem already without the cache part)



Cluster live since 2014-08-29 15:34:16

Config file on host ceph001:

[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.143.8.0/24
filestore_xattr_use_omap = 1
fsid = 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
mon_cluster_log_to_syslog = 1
mon_host = ceph001.cubone.os, ceph002.cubone.os, ceph003.cubone.os
mon_initial_members = ceph001, ceph002, ceph003
osd_crush_update_on_start = 0
osd_journal_size = 10240
osd_pool_default_min_size = 2
osd_pool_default_pg_num = 512

[ceph-users] ceph cluster inconsistency keyvaluestore

2014-09-01 Thread Kenneth Waegeman

Hi,

I reinstalled the cluster with 0.84, and tried again running rados  
bench on a EC coded pool on keyvaluestore.

Nothing crashed this time, but when I check the status:

 health HEALTH_ERR 128 pgs inconsistent; 128 scrub errors; too  
few pgs per osd (15  min 20)
 monmap e1: 3 mons at  
{ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, election epoch 8, quorum 0,1,2  
ceph001,ceph002,ceph003

 osdmap e174: 78 osds: 78 up, 78 in
  pgmap v147680: 1216 pgs, 3 pools, 14758 GB data, 3690 kobjects
1753 GB used, 129 TB / 131 TB avail
1088 active+clean
 128 active+clean+inconsistent

the 128 inconsistent pgs are ALL the pgs of the EC KV store ( the  
others are on Filestore)


The only thing I can see in the logs is that after the rados tests, it  
start scrubbing, and for each KV pg I get something like this:


2014-08-31 11:14:09.050747 osd.11 10.141.8.180:6833/61098 4 : [ERR]  
2.3s0 scrub stat mismatch, got 28164/29291 objects, 0/0 clones,  
28164/29291 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts,  
118128377856/122855358464 bytes.


What could here be the problem?
Thanks again!!

Kenneth


- Message from Haomai Wang haomaiw...@gmail.com -
   Date: Tue, 26 Aug 2014 17:11:43 +0800
   From: Haomai Wang haomaiw...@gmail.com
Subject: Re: [ceph-users] ceph cluster inconsistency?
 To: Kenneth Waegeman kenneth.waege...@ugent.be
 Cc: ceph-users@lists.ceph.com



Hmm, it looks like you hit this bug(http://tracker.ceph.com/issues/9223).

Sorry for the late message, I forget that this fix is merged into 0.84.

Thanks for your patient :-)

On Tue, Aug 26, 2014 at 4:39 PM, Kenneth Waegeman
kenneth.waege...@ugent.be wrote:


Hi,

In the meantime I already tried with upgrading the cluster to 0.84, to see
if that made a difference, and it seems it does.
I can't reproduce the crashing osds by doing a 'rados -p ecdata ls' anymore.

But now the cluster detect it is inconsistent:

  cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
   health HEALTH_ERR 40 pgs inconsistent; 40 scrub errors; too few pgs
per osd (4  min 20); mon.ceph002 low disk space
   monmap e3: 3 mons at
{ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0},
election epoch 30, quorum 0,1,2 ceph001,ceph002,ceph003
   mdsmap e78951: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 up:standby
   osdmap e145384: 78 osds: 78 up, 78 in
pgmap v247095: 320 pgs, 4 pools, 15366 GB data, 3841 kobjects
  1502 GB used, 129 TB / 131 TB avail
   279 active+clean
40 active+clean+inconsistent
 1 active+clean+scrubbing+deep


I tried to do ceph pg repair for all the inconsistent pgs:

  cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
   health HEALTH_ERR 40 pgs inconsistent; 1 pgs repair; 40 scrub errors;
too few pgs per osd (4  min 20); mon.ceph002 low disk space
   monmap e3: 3 mons at
{ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0},
election epoch 30, quorum 0,1,2 ceph001,ceph002,ceph003
   mdsmap e79486: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 up:standby
   osdmap e146452: 78 osds: 78 up, 78 in
pgmap v248520: 320 pgs, 4 pools, 15366 GB data, 3841 kobjects
  1503 GB used, 129 TB / 131 TB avail
   279 active+clean
39 active+clean+inconsistent
 1 active+clean+scrubbing+deep
 1 active+clean+scrubbing+deep+inconsistent+repair

I let it recovering through the night, but this morning the mons were all
gone, nothing to see in the log files.. The osds were all still up!

cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
 health HEALTH_ERR 36 pgs inconsistent; 1 pgs repair; 36 scrub errors;
too few pgs per osd (4  min 20)
 monmap e7: 3 mons at
{ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0},
election epoch 44, quorum 0,1,2 ceph001,ceph002,ceph003
 mdsmap e109481: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 up:standby
 osdmap e203410: 78 osds: 78 up, 78 in
  pgmap v331747: 320 pgs, 4 pools, 15251 GB data, 3812 kobjects
1547 GB used, 129 TB / 131 TB avail
   1 active+clean+scrubbing+deep+inconsistent+repair
 284 active+clean
  35 active+clean+inconsistent

I restarted the monitors now, I will let you know when I see something
more..




- Message from Haomai Wang haomaiw...@gmail.com -
 Date: Sun, 24 Aug 2014 12:51:41 +0800

 From: Haomai Wang haomaiw...@gmail.com
Subject: Re: [ceph-users] ceph cluster inconsistency?
   To: Kenneth Waegeman kenneth.waege...@ugent.be,
ceph-users@lists.ceph.com



It's really strange! I write a test program according the key ordering
you provided and parse the corresponding value. It's true!

I have no idea now. If free

Re: [ceph-users] ceph cluster inconsistency keyvaluestore

2014-09-01 Thread Kenneth Waegeman
 3.906
item osd.13 weight 3.906
item osd.15 weight 3.906
item osd.17 weight 3.906
item osd.19 weight 3.906
item osd.21 weight 3.906
item osd.23 weight 3.906
item osd.25 weight 3.906
}
host ceph002-ec {
id -6   # do not change unnecessarily
# weight 46.872
alg straw
hash 0  # rjenkins1
item osd.29 weight 3.906
item osd.31 weight 3.906
item osd.33 weight 3.906
item osd.35 weight 3.906
item osd.37 weight 3.906
item osd.39 weight 3.906
item osd.41 weight 3.906
item osd.43 weight 3.906
item osd.45 weight 3.906
item osd.47 weight 3.906
item osd.49 weight 3.906
item osd.51 weight 3.906
}
host ceph003-ec {
id -7   # do not change unnecessarily
# weight 46.872
alg straw
hash 0  # rjenkins1
item osd.55 weight 3.906
item osd.57 weight 3.906
item osd.59 weight 3.906
item osd.61 weight 3.906
item osd.63 weight 3.906
item osd.65 weight 3.906
item osd.67 weight 3.906
item osd.69 weight 3.906
item osd.71 weight 3.906
item osd.73 weight 3.906
item osd.75 weight 3.906
item osd.77 weight 3.906
}
root default-ec {
id -8   # do not change unnecessarily
# weight 140.616
alg straw
hash 0  # rjenkins1
item ceph001-ec weight 46.872
item ceph002-ec weight 46.872
item ceph003-ec weight 46.872
}
host ceph001-cache {
id -9   # do not change unnecessarily
# weight 46.872
alg straw
hash 0  # rjenkins1
item osd.2 weight 3.906
item osd.4 weight 3.906
item osd.6 weight 3.906
item osd.8 weight 3.906
item osd.10 weight 3.906
item osd.12 weight 3.906
item osd.14 weight 3.906
item osd.16 weight 3.906
item osd.18 weight 3.906
item osd.20 weight 3.906
item osd.22 weight 3.906
item osd.24 weight 3.906
}
host ceph002-cache {
id -10  # do not change unnecessarily
# weight 46.872
alg straw
hash 0  # rjenkins1
item osd.28 weight 3.906
item osd.30 weight 3.906
item osd.32 weight 3.906
item osd.34 weight 3.906
item osd.36 weight 3.906
item osd.38 weight 3.906
item osd.40 weight 3.906
item osd.42 weight 3.906
item osd.44 weight 3.906
item osd.46 weight 3.906
item osd.48 weight 3.906
item osd.50 weight 3.906
}
host ceph003-cache {
id -11  # do not change unnecessarily
# weight 46.872
alg straw
hash 0  # rjenkins1
item osd.54 weight 3.906
item osd.56 weight 3.906
item osd.58 weight 3.906
item osd.60 weight 3.906
item osd.62 weight 3.906
item osd.64 weight 3.906
item osd.66 weight 3.906
item osd.68 weight 3.906
item osd.70 weight 3.906
item osd.72 weight 3.906
item osd.74 weight 3.906
item osd.76 weight 3.906
}
root default-cache {
id -12  # do not change unnecessarily
# weight 140.616
alg straw
hash 0  # rjenkins1
item ceph001-cache weight 46.872
item ceph002-cache weight 46.872
item ceph003-cache weight 46.872
}

# rules
rule cache {
ruleset 0
type replicated
min_size 1
max_size 10
step take default-cache
step chooseleaf firstn 0 type host
step emit
}
rule metadata {
ruleset 1
type replicated
min_size 1
max_size 10
step take default-ssd
step chooseleaf firstn 0 type host
step emit
}
rule ecdata {
ruleset 2
type erasure
min_size 3
max_size 20
step set_chooseleaf_tries 5
step take default-ec
step choose indep 0 type osd
step emit
}

# end crush map

The benchmarks I then did:

./benchrw 5

benchrw:
/usr/bin/rados -p ecdata bench $1 write --no-cleanup
/usr/bin/rados -p ecdata bench $1 seq
/usr/bin/rados -p ecdata bench $1 seq 
/usr/bin/rados -p ecdata bench $1 write --no-cleanup


Srubbing errors started soon after that: 2014-08-31 10:59:14


Please let me know if you need more information, and thanks !

Kenneth

- Message from Haomai Wang haomaiw...@gmail.com -
   Date: Mon, 1 Sep 2014 21:30:16 +0800
   From: Haomai Wang haomaiw...@gmail.com
Subject: Re: ceph cluster inconsistency keyvaluestore
 To: Kenneth Waegeman kenneth.waege...@ugent.be
 Cc: ceph-users@lists.ceph.com



Hmm, could you please list your instructions including cluster existing
time and all relevant ops? I want to reproduce it.


On Mon, Sep 1, 2014 at 4:45 PM, Kenneth Waegeman kenneth.waege...@ugent.be
wrote:


Hi,

I reinstalled the cluster

  1   2   >