Re: [ceph-users] RGW won't start after upgrade to 12.2.5

2018-05-22 Thread Marc Spencer
This is now filed as bug #24228
 
    
Marc D. Spencer
Chief Technology Officer
T: 866.808.4937 × 202 
E: mspen...@liquidpixels.com 
www.liquidpixels.com 
  
 
 


> On May 21, 2018, at 10:58 PM, Marc Spencer  > wrote:
> 
> I found the issue, for the curious.
> 
> The default configuration for rgw_ldap_secret seems to be set to 
> /etc/openldap/secret, which on my system is empty:
> 
> # ceph-conf -D | grep ldap
> rgw_ldap_binddn = uid=admin,cn=users,dc=example,dc=com
> rgw_ldap_dnattr = uid
> rgw_ldap_searchdn = cn=users,cn=accounts,dc=example,dc=com
> rgw_ldap_searchfilter = 
> rgw_ldap_secret = /etc/openldap/secret
> rgw_ldap_uri = ldaps://
> rgw_s3_auth_use_ldap = false
> 
> # cat /etc/openldap/secret
> cat: /etc/openldap/secret: No such file or directory
> 
> But the code assumes that if it is set, the named file has content. Since it 
> doesn’t, safe_read_file() asserts.
> 
> I set it to nothing (rgw_ldap_secret = ) in my configuration, and everything 
> seems happy.
> 
> std::string parse_rgw_ldap_bindpw(CephContext* ctx)
> {
>   string ldap_bindpw;
>   string ldap_secret = ctx->_conf->rgw_ldap_secret;
> 
>   if (ldap_secret.empty()) {
> ldout(ctx, 10)
>   << __func__ << " LDAP auth no rgw_ldap_secret file found in conf"
>   << dendl;
> } else {
>   char bindpw[1024];
>   memset(bindpw, 0, 1024);
>   int pwlen = safe_read_file("" /* base */, ldap_secret.c_str(),
>  bindpw, 1023);
> if (pwlen) {
>   ldap_bindpw = bindpw;
>   boost::algorithm::trim(ldap_bindpw);
>   if (ldap_bindpw.back() == '\n')
> ldap_bindpw.pop_back();
> }
>   }
> 
>   return ldap_bindpw;
> }
> 
> 
>> On May 21, 2018, at 5:27 PM, Marc Spencer > > wrote:
>> 
>> Hi,
>> 
>>   I have a test cluster of 4 servers running Luminous. We were running 
>> 12.2.2 under Fedora 17 and have just completed upgrading to 12.2.5 under 
>> Fedora 18.
>> 
>>   All seems well: all MONs are up, OSDs are up, I can see objects stored as 
>> expected with rados -p default.rgw.buckets.data ls. 
>> 
>>   But when i start RGW, my load goes through the roof as radosgw 
>> continuously rapid-fire core dumps. 
>> 
>> Log Excerpt:
>> 
>> 
>> … 
>> 
>>-16> 2018-05-21 15:52:48.244579 7fc70eeda700  5 -- 
>> 10.19.33.13:0/3446208184 >> 10.19.33.14:6800/1417 conn(0x55e78a610800 :-1 
>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=14567 cs=1 l=1). rx osd.6 
>> seq 7 0x55e78a67b500 osd_op_reply(47 notify.6 [watch watch cookie 
>> 94452947886080] v1092'43446 uv43445 ondisk = 0) v8
>>-15> 2018-05-21 15:52:48.244619 7fc70eeda700  1 -- 
>> 10.19.33.13:0/3446208184 <== osd.6 10.19.33.14:6800/1417 7  
>> osd_op_reply(47 notify.6 [watch watch cookie 94452947886080] v1092'43446 
>> uv43445 ondisk = 0) v8  152+0+0 (1199963694 0 0) 0x55e78a67b500 con 
>> 0x55e78a610800
>>-14> 2018-05-21 15:52:48.244777 7fc723656000  1 -- 
>> 10.19.33.13:0/3446208184 --> 10.19.33.15:6800/1433 -- osd_op(unknown.0.0:48 
>> 16.1 16:93e5b521:::notify.7:head [create] snapc 0=[] 
>> ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a67bc00 con 0
>>-13> 2018-05-21 15:52:48.275650 7fc70eeda700  5 -- 
>> 10.19.33.13:0/3446208184 >> 10.19.33.15:6800/1433 conn(0x55e78a65e000 :-1 
>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=14572 cs=1 l=1). rx osd.2 
>> seq 7 0x55e78a678380 osd_op_reply(48 notify.7 [create] v1092'43453 uv43453 
>> ondisk = 0) v8
>>-12> 2018-05-21 15:52:48.275675 7fc70eeda700  1 -- 
>> 10.19.33.13:0/3446208184 <== osd.2 10.19.33.15:6800/1433 7  
>> osd_op_reply(48 notify.7 [create] v1092'43453 uv43453 ondisk = 0) v8  
>> 152+0+0 (2720997170 0 0) 0x55e78a678380 con 0x55e78a65e000
>>-11> 2018-05-21 15:52:48.275849 7fc723656000  1 -- 
>> 10.19.33.13:0/3446208184 --> 10.19.33.15:6800/1433 -- osd_op(unknown.0.0:49 
>> 16.1 16:93e5b521:::notify.7:head [watch watch cookie 94452947887232] snapc 
>> 0=[] ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a688000 con 0
>>-10> 2018-05-21 15:52:48.296799 7fc70eeda700  5 -- 
>> 10.19.33.13:0/3446208184 >> 10.19.33.15:6800/1433 conn(0x55e78a65e000 :-1 
>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=14572 cs=1 l=1). rx osd.2 
>> seq 8 0x55e78a688000 osd_op_reply(49 notify.7 [watch watch cookie 
>> 94452947887232] v1092'43454 uv43453 ondisk = 0) v8
>> -9> 2018-05-21 15:52:48.296824 7fc70eeda700  1 -- 
>> 10.19.33.13:0/3446208184 <== osd.2 10.19.33.15:6800/1433 8  
>> osd_op_reply(49 notify.7 [watch watch cookie 94452947887232] 

Re: [ceph-users] RGW won't start after upgrade to 12.2.5

2018-05-21 Thread Konstantin Shalygin

The default configuration for rgw_ldap_secret seems to be set to 
/etc/openldap/secret, which on my system is empty:



Please, create issue on tracker .

Thanks.



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW won't start after upgrade to 12.2.5

2018-05-21 Thread Marc Spencer
I found the issue, for the curious.

The default configuration for rgw_ldap_secret seems to be set to 
/etc/openldap/secret, which on my system is empty:

# ceph-conf -D | grep ldap
rgw_ldap_binddn = uid=admin,cn=users,dc=example,dc=com
rgw_ldap_dnattr = uid
rgw_ldap_searchdn = cn=users,cn=accounts,dc=example,dc=com
rgw_ldap_searchfilter = 
rgw_ldap_secret = /etc/openldap/secret
rgw_ldap_uri = ldaps://
rgw_s3_auth_use_ldap = false

# cat /etc/openldap/secret
cat: /etc/openldap/secret: No such file or directory

But the code assumes that if it is set, the named file has content. Since it 
doesn’t, safe_read_file() asserts.

I set it to nothing (rgw_ldap_secret = ) in my configuration, and everything 
seems happy.

std::string parse_rgw_ldap_bindpw(CephContext* ctx)
{
  string ldap_bindpw;
  string ldap_secret = ctx->_conf->rgw_ldap_secret;

  if (ldap_secret.empty()) {
ldout(ctx, 10)
  << __func__ << " LDAP auth no rgw_ldap_secret file found in conf"
  << dendl;
} else {
  char bindpw[1024];
  memset(bindpw, 0, 1024);
  int pwlen = safe_read_file("" /* base */, ldap_secret.c_str(),
 bindpw, 1023);
if (pwlen) {
  ldap_bindpw = bindpw;
  boost::algorithm::trim(ldap_bindpw);
  if (ldap_bindpw.back() == '\n')
ldap_bindpw.pop_back();
}
  }

  return ldap_bindpw;
}


> On May 21, 2018, at 5:27 PM, Marc Spencer  > wrote:
> 
> Hi,
> 
>   I have a test cluster of 4 servers running Luminous. We were running 12.2.2 
> under Fedora 17 and have just completed upgrading to 12.2.5 under Fedora 18.
> 
>   All seems well: all MONs are up, OSDs are up, I can see objects stored as 
> expected with rados -p default.rgw.buckets.data ls. 
> 
>   But when i start RGW, my load goes through the roof as radosgw continuously 
> rapid-fire core dumps. 
> 
> Log Excerpt:
> 
> 
> … 
> 
>-16> 2018-05-21 15:52:48.244579 7fc70eeda700  5 -- 
> 10.19.33.13:0/3446208184 >> 10.19.33.14:6800/1417 conn(0x55e78a610800 :-1 
> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=14567 cs=1 l=1). rx osd.6 
> seq 7 0x55e78a67b500 osd_op_reply(47 notify.6 [watch watch cookie 
> 94452947886080] v1092'43446 uv43445 ondisk = 0) v8
>-15> 2018-05-21 15:52:48.244619 7fc70eeda700  1 -- 
> 10.19.33.13:0/3446208184 <== osd.6 10.19.33.14:6800/1417 7  
> osd_op_reply(47 notify.6 [watch watch cookie 94452947886080] v1092'43446 
> uv43445 ondisk = 0) v8  152+0+0 (1199963694 0 0) 0x55e78a67b500 con 
> 0x55e78a610800
>-14> 2018-05-21 15:52:48.244777 7fc723656000  1 -- 
> 10.19.33.13:0/3446208184 --> 10.19.33.15:6800/1433 -- osd_op(unknown.0.0:48 
> 16.1 16:93e5b521:::notify.7:head [create] snapc 0=[] 
> ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a67bc00 con 0
>-13> 2018-05-21 15:52:48.275650 7fc70eeda700  5 -- 
> 10.19.33.13:0/3446208184 >> 10.19.33.15:6800/1433 conn(0x55e78a65e000 :-1 
> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=14572 cs=1 l=1). rx osd.2 
> seq 7 0x55e78a678380 osd_op_reply(48 notify.7 [create] v1092'43453 uv43453 
> ondisk = 0) v8
>-12> 2018-05-21 15:52:48.275675 7fc70eeda700  1 -- 
> 10.19.33.13:0/3446208184 <== osd.2 10.19.33.15:6800/1433 7  
> osd_op_reply(48 notify.7 [create] v1092'43453 uv43453 ondisk = 0) v8  
> 152+0+0 (2720997170 0 0) 0x55e78a678380 con 0x55e78a65e000
>-11> 2018-05-21 15:52:48.275849 7fc723656000  1 -- 
> 10.19.33.13:0/3446208184 --> 10.19.33.15:6800/1433 -- osd_op(unknown.0.0:49 
> 16.1 16:93e5b521:::notify.7:head [watch watch cookie 94452947887232] snapc 
> 0=[] ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a688000 con 0
>-10> 2018-05-21 15:52:48.296799 7fc70eeda700  5 -- 
> 10.19.33.13:0/3446208184 >> 10.19.33.15:6800/1433 conn(0x55e78a65e000 :-1 
> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=14572 cs=1 l=1). rx osd.2 
> seq 8 0x55e78a688000 osd_op_reply(49 notify.7 [watch watch cookie 
> 94452947887232] v1092'43454 uv43453 ondisk = 0) v8
> -9> 2018-05-21 15:52:48.296824 7fc70eeda700  1 -- 
> 10.19.33.13:0/3446208184 <== osd.2 10.19.33.15:6800/1433 8  
> osd_op_reply(49 notify.7 [watch watch cookie 94452947887232] v1092'43454 
> uv43453 ondisk = 0) v8  152+0+0 (3812136207 0 0) 0x55e78a688000 con 
> 0x55e78a65e000
> -8> 2018-05-21 15:52:48.296924 7fc723656000  2 all 8 watchers are set, 
> enabling cache
> -7> 2018-05-21 15:52:48.297135 7fc57cbb6700  2 garbage collection: start
> -6> 2018-05-21 15:52:48.297185 7fc57c3b5700  2 object expiration: start
> -5> 2018-05-21 15:52:48.297321 7fc57cbb6700  1 -- 
> 10.19.33.13:0/3446208184 --> 10.19.33.16:6804/1596 -- osd_op(unknown.0.0:50 
> 18.3 18:d242335b:gc::gc.2:head [call lock.lock] snapc 0=[] 
> ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a692000 con 0
> -4> 2018-05-21 15:52:48.297395 7fc57c3b5700  1 -- 
> 10.19.33.13:0/3446208184 --> 10.19.33.16:6804/1596 -- osd_op(unknown.0.0:51 
> 18.0 

[ceph-users] RGW won't start after upgrade to 12.2.5

2018-05-21 Thread Marc Spencer
Hi,

  I have a test cluster of 4 servers running Luminous. We were running 12.2.2 
under Fedora 17 and have just completed upgrading to 12.2.5 under Fedora 18.

  All seems well: all MONs are up, OSDs are up, I can see objects stored as 
expected with rados -p default.rgw.buckets.data ls. 

  But when i start RGW, my load goes through the roof as radosgw continuously 
rapid-fire core dumps. 

Log Excerpt:


… 

   -16> 2018-05-21 15:52:48.244579 7fc70eeda700  5 -- 10.19.33.13:0/3446208184 
>> 10.19.33.14:6800/1417 conn(0x55e78a610800 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=14567 cs=1 l=1). rx osd.6 seq 
7 0x55e78a67b500 osd_op_reply(47 notify.6 [watch watch cookie 94452947886080] 
v1092'43446 uv43445 ondisk = 0) v8
   -15> 2018-05-21 15:52:48.244619 7fc70eeda700  1 -- 10.19.33.13:0/3446208184 
<== osd.6 10.19.33.14:6800/1417 7  osd_op_reply(47 notify.6 [watch watch 
cookie 94452947886080] v1092'43446 uv43445 ondisk = 0) v8  152+0+0 
(1199963694 0 0) 0x55e78a67b500 con 0x55e78a610800
   -14> 2018-05-21 15:52:48.244777 7fc723656000  1 -- 10.19.33.13:0/3446208184 
--> 10.19.33.15:6800/1433 -- osd_op(unknown.0.0:48 16.1 
16:93e5b521:::notify.7:head [create] snapc 0=[] 
ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a67bc00 con 0
   -13> 2018-05-21 15:52:48.275650 7fc70eeda700  5 -- 10.19.33.13:0/3446208184 
>> 10.19.33.15:6800/1433 conn(0x55e78a65e000 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=14572 cs=1 l=1). rx osd.2 seq 
7 0x55e78a678380 osd_op_reply(48 notify.7 [create] v1092'43453 uv43453 ondisk = 
0) v8
   -12> 2018-05-21 15:52:48.275675 7fc70eeda700  1 -- 10.19.33.13:0/3446208184 
<== osd.2 10.19.33.15:6800/1433 7  osd_op_reply(48 notify.7 [create] 
v1092'43453 uv43453 ondisk = 0) v8  152+0+0 (2720997170 0 0) 0x55e78a678380 
con 0x55e78a65e000
   -11> 2018-05-21 15:52:48.275849 7fc723656000  1 -- 10.19.33.13:0/3446208184 
--> 10.19.33.15:6800/1433 -- osd_op(unknown.0.0:49 16.1 
16:93e5b521:::notify.7:head [watch watch cookie 94452947887232] snapc 0=[] 
ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a688000 con 0
   -10> 2018-05-21 15:52:48.296799 7fc70eeda700  5 -- 10.19.33.13:0/3446208184 
>> 10.19.33.15:6800/1433 conn(0x55e78a65e000 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=14572 cs=1 l=1). rx osd.2 seq 
8 0x55e78a688000 osd_op_reply(49 notify.7 [watch watch cookie 94452947887232] 
v1092'43454 uv43453 ondisk = 0) v8
-9> 2018-05-21 15:52:48.296824 7fc70eeda700  1 -- 10.19.33.13:0/3446208184 
<== osd.2 10.19.33.15:6800/1433 8  osd_op_reply(49 notify.7 [watch watch 
cookie 94452947887232] v1092'43454 uv43453 ondisk = 0) v8  152+0+0 
(3812136207 0 0) 0x55e78a688000 con 0x55e78a65e000
-8> 2018-05-21 15:52:48.296924 7fc723656000  2 all 8 watchers are set, 
enabling cache
-7> 2018-05-21 15:52:48.297135 7fc57cbb6700  2 garbage collection: start
-6> 2018-05-21 15:52:48.297185 7fc57c3b5700  2 object expiration: start
-5> 2018-05-21 15:52:48.297321 7fc57cbb6700  1 -- 10.19.33.13:0/3446208184 
--> 10.19.33.16:6804/1596 -- osd_op(unknown.0.0:50 18.3 
18:d242335b:gc::gc.2:head [call lock.lock] snapc 0=[] 
ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a692000 con 0
-4> 2018-05-21 15:52:48.297395 7fc57c3b5700  1 -- 10.19.33.13:0/3446208184 
--> 10.19.33.16:6804/1596 -- osd_op(unknown.0.0:51 18.0 
18:1a734c59:::obj_delete_at_hint.00:head [call lock.lock] snapc 0=[] 
ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a692380 con 0
-3> 2018-05-21 15:52:48.299463 7fc568b8e700  5 schedule life cycle next 
start time: Tue May 22 04:00:00 2018
-2> 2018-05-21 15:52:48.299528 7fc567b8c700  5 ERROR: sync_all_users() 
returned ret=-2
-1> 2018-05-21 15:52:48.299698 7fc56738b700  1 -- 10.19.33.13:0/3446208184 
--> 10.19.33.14:6800/1417 -- osd_op(unknown.0.0:52 18.7 
18:e9187ab8:reshard::reshard.00:head [call lock.lock] snapc 0=[] 
ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a54fc00 con 0
 0> 2018-05-21 15:52:48.301978 7fc723656000 -1 *** Caught signal (Aborted) 
**
 in thread 7fc723656000 thread_name:radosgw

 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous 
(stable)
 1: (()+0x22d82c) [0x55e7861a882c]
 2: (()+0x11fb0) [0x7fc719270fb0]
 3: (gsignal()+0x10b) [0x7fc716603f4b]
 4: (abort()+0x12b) [0x7fc7165ee591]
 5: (parse_rgw_ldap_bindpw[abi:cxx11](CephContext*)+0x68b) [0x55e78647409b]
 6: (rgw::auth::s3::LDAPEngine::init(CephContext*)+0xb9) [0x55e7863a38f9]
 7: (rgw::auth::s3::ExternalAuthStrategy::ExternalAuthStrategy(CephContext*, 
RGWRados*, rgw::auth::s3::AWSEngine::VersionAbstractor*)+0x74) [0x55e786154bc4]
 8: (std::__shared_ptr::__shared_ptr(std::_Sp_make_shared_tag, 
std::allocator const&, CephContext* const&, 
RGWRados* const&)+0xf8) [0x55e786158f78]
 9: (main()+0x196b) [0x55e78614463b]
 10: (__libc_start_main()+0xeb) [0x7fc7165f01bb]