Re: [ceph-users] radosgw fails with "ERROR: failed to initialize watch: (34) Numerical result out of range"

2018-01-16 Thread Brad Hubbard
See http://tracker.ceph.com/issues/22351#note-11

On Wed, Jan 17, 2018 at 10:09 AM, Brad Hubbard  wrote:
> On Wed, Jan 17, 2018 at 5:41 AM, Brad Hubbard  wrote:
>> On Wed, Jan 17, 2018 at 2:20 AM, Nikos Kormpakis  wrote:
>>> On 01/16/2018 12:53 AM, Brad Hubbard wrote:
 On Tue, Jan 16, 2018 at 1:35 AM, Alexander Peters  
 wrote:
> i created the dump output but it looks very cryptic to me so i can't 
> really make much sense of it. is there anything to look for in particular?

 Yes, basically we are looking for any line that ends in "= 34". You
 might also find piping it through c++filt helps.

 Something like...

 $ c++filt >>
>>> Hello,
>>> we're facing the exact same issue. I added some more info about
>>> our cluster and output from ltrace in [1].
>>
>> Unfortunately, the strlen lines in that output are expected.
>>
>> Is it possible for me to access the ltrace output file somehow
>> (you could email it directly or use  ceph-post-file perhaps)?
>
> Ah, nm, my bad.
>
> It turns out what we need is the hexadecimal int representation of '-34'.
>
> $ c++filt 
> I'll update the tracker accordingly.
>
>>
>>>
>>> Best regards,
>>> Nikos.
>>>
>>> [1] http://tracker.ceph.com/issues/22351
>>>
>
> i think i am going to read up on how interpret ltrace output...
>
> BR
> Alex
>
> - Ursprüngliche Mail -
> Von: "Brad Hubbard" 
> An: "Alexander Peters" 
> CC: "Ceph Users" 
> Gesendet: Montag, 15. Januar 2018 03:09:53
> Betreff: Re: [ceph-users] radosgw fails with "ERROR: failed to initialize 
> watch: (34) Numerical result out of range"
>
> On Mon, Jan 15, 2018 at 11:38 AM, Brad Hubbard  
> wrote:
>> On Mon, Jan 15, 2018 at 10:38 AM, Alexander Peters
>>  wrote:
>>> Thanks for the reply - unfortunatly the link you send is behind a 
>>> paywall so
>>> at least for now i can’t read it.
>>
>> That's why I provided the cause as laid out in that article (pgp num > 
>> pg num).
>>
>> Do you have any settings in ceph.conf related to pg_num or pgp_num?
>>
>> If not, please add your details to http://tracker.ceph.com/issues/22351
>
> Rados can return ERANGE (34) in multiple places so identifying where
> might be a big step towards working this out.
>
> $ ltrace -fo /tmp/ltrace.out /usr/bin/radosgw --cluster ceph --name
> client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d
>
> The objective is to find which function(s) return 34.
>
>>
>>>
>>> output of ceph osd dump shows that pgp num == pg num:
>>>
>>> [root@ctrl01 ~]# ceph osd dump
>>> epoch 142
>>> fsid 0e2d841f-68fd-4629-9813-ab083e8c0f10
>>> created 2017-12-20 23:04:59.781525
>>> modified 2018-01-14 21:30:57.528682
>>> flags sortbitwise,recovery_deletes,purged_snapdirs
>>> crush_version 6
>>> full_ratio 0.95
>>> backfillfull_ratio 0.9
>>> nearfull_ratio 0.85
>>> require_min_compat_client jewel
>>> min_compat_client jewel
>>> require_osd_release luminous
>>> pool 1 'glance' replicated size 3 min_size 2 crush_rule 0 object_hash
>>> rjenkins pg_num 64 pgp_num 64 last_change 119 flags hashpspool 
>>> stripe_width
>>> 0 application rbd
>>> removed_snaps [1~3]
>>> pool 2 'cinder-2' replicated size 3 min_size 2 crush_rule 0 object_hash
>>> rjenkins pg_num 64 pgp_num 64 last_change 120 flags hashpspool 
>>> stripe_width
>>> 0 application rbd
>>> removed_snaps [1~3]
>>> pool 3 'cinder-3' replicated size 3 min_size 2 crush_rule 0 object_hash
>>> rjenkins pg_num 64 pgp_num 64 last_change 121 flags hashpspool 
>>> stripe_width
>>> 0 application rbd
>>> removed_snaps [1~3]
>>> pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
>>> rjenkins pg_num 8 pgp_num 8 last_change 94 owner 18446744073709551615 
>>> flags
>>> hashpspool stripe_width 0 application rgw
>>> max_osd 3
>>> osd.0 up   in  weight 1 up_from 82 up_thru 140 down_at 79
>>> last_clean_interval [23,78) 10.16.0.11:6800/1795 10.16.0.11:6801/1795
>>> 10.16.0.11:6802/1795 10.16.0.11:6803/1795 exists,up
>>> abe33844-6d98-4ede-81a8-a8bdc92dada8
>>> osd.1 up   in  weight 1 up_from 73 up_thru 140 down_at 71
>>> last_clean_interval [55,72) 10.16.0.13:6800/1756 10.16.0.13:6804/1001756
>>> 10.16.0.13:6805/1001756 10.16.0.13:6806/1001756 exists,up
>>> 0dab9372-6ffe-4a23-a8b7-4edca3745a2a
>>> osd.2 up   in  weight 1 up_from 140 up_thru 140 down_at 133
>>> last_clean_interval [31,132) 10.16.0.12:6800/1749 10.16.0.12:6801/1749
>>> 10.16.0.12:6802/1749 10.16.0.12:6803/1749 exists,up
>>> 220bba17-8119-4035-9e43-5b8eaa27562f

Re: [ceph-users] radosgw fails with "ERROR: failed to initialize watch: (34) Numerical result out of range"

2018-01-16 Thread Brad Hubbard
On Wed, Jan 17, 2018 at 5:41 AM, Brad Hubbard  wrote:
> On Wed, Jan 17, 2018 at 2:20 AM, Nikos Kormpakis  wrote:
>> On 01/16/2018 12:53 AM, Brad Hubbard wrote:
>>> On Tue, Jan 16, 2018 at 1:35 AM, Alexander Peters  wrote:
 i created the dump output but it looks very cryptic to me so i can't 
 really make much sense of it. is there anything to look for in particular?
>>>
>>> Yes, basically we are looking for any line that ends in "= 34". You
>>> might also find piping it through c++filt helps.
>>>
>>> Something like...
>>>
>>> $ c++filt >
>> Hello,
>> we're facing the exact same issue. I added some more info about
>> our cluster and output from ltrace in [1].
>
> Unfortunately, the strlen lines in that output are expected.
>
> Is it possible for me to access the ltrace output file somehow
> (you could email it directly or use  ceph-post-file perhaps)?

Ah, nm, my bad.

It turns out what we need is the hexadecimal int representation of '-34'.

$ c++filt 
>>
>> Best regards,
>> Nikos.
>>
>> [1] http://tracker.ceph.com/issues/22351
>>

 i think i am going to read up on how interpret ltrace output...

 BR
 Alex

 - Ursprüngliche Mail -
 Von: "Brad Hubbard" 
 An: "Alexander Peters" 
 CC: "Ceph Users" 
 Gesendet: Montag, 15. Januar 2018 03:09:53
 Betreff: Re: [ceph-users] radosgw fails with "ERROR: failed to initialize 
 watch: (34) Numerical result out of range"

 On Mon, Jan 15, 2018 at 11:38 AM, Brad Hubbard  wrote:
> On Mon, Jan 15, 2018 at 10:38 AM, Alexander Peters
>  wrote:
>> Thanks for the reply - unfortunatly the link you send is behind a 
>> paywall so
>> at least for now i can’t read it.
>
> That's why I provided the cause as laid out in that article (pgp num > pg 
> num).
>
> Do you have any settings in ceph.conf related to pg_num or pgp_num?
>
> If not, please add your details to http://tracker.ceph.com/issues/22351

 Rados can return ERANGE (34) in multiple places so identifying where
 might be a big step towards working this out.

 $ ltrace -fo /tmp/ltrace.out /usr/bin/radosgw --cluster ceph --name
 client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d

 The objective is to find which function(s) return 34.

>
>>
>> output of ceph osd dump shows that pgp num == pg num:
>>
>> [root@ctrl01 ~]# ceph osd dump
>> epoch 142
>> fsid 0e2d841f-68fd-4629-9813-ab083e8c0f10
>> created 2017-12-20 23:04:59.781525
>> modified 2018-01-14 21:30:57.528682
>> flags sortbitwise,recovery_deletes,purged_snapdirs
>> crush_version 6
>> full_ratio 0.95
>> backfillfull_ratio 0.9
>> nearfull_ratio 0.85
>> require_min_compat_client jewel
>> min_compat_client jewel
>> require_osd_release luminous
>> pool 1 'glance' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 119 flags hashpspool 
>> stripe_width
>> 0 application rbd
>> removed_snaps [1~3]
>> pool 2 'cinder-2' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 120 flags hashpspool 
>> stripe_width
>> 0 application rbd
>> removed_snaps [1~3]
>> pool 3 'cinder-3' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 121 flags hashpspool 
>> stripe_width
>> 0 application rbd
>> removed_snaps [1~3]
>> pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 8 pgp_num 8 last_change 94 owner 18446744073709551615 
>> flags
>> hashpspool stripe_width 0 application rgw
>> max_osd 3
>> osd.0 up   in  weight 1 up_from 82 up_thru 140 down_at 79
>> last_clean_interval [23,78) 10.16.0.11:6800/1795 10.16.0.11:6801/1795
>> 10.16.0.11:6802/1795 10.16.0.11:6803/1795 exists,up
>> abe33844-6d98-4ede-81a8-a8bdc92dada8
>> osd.1 up   in  weight 1 up_from 73 up_thru 140 down_at 71
>> last_clean_interval [55,72) 10.16.0.13:6800/1756 10.16.0.13:6804/1001756
>> 10.16.0.13:6805/1001756 10.16.0.13:6806/1001756 exists,up
>> 0dab9372-6ffe-4a23-a8b7-4edca3745a2a
>> osd.2 up   in  weight 1 up_from 140 up_thru 140 down_at 133
>> last_clean_interval [31,132) 10.16.0.12:6800/1749 10.16.0.12:6801/1749
>> 10.16.0.12:6802/1749 10.16.0.12:6803/1749 exists,up
>> 220bba17-8119-4035-9e43-5b8eaa27562f
>>
>>
>> Am 15.01.2018 um 01:33 schrieb Brad Hubbard :
>>
>> On Mon, Jan 15, 2018 at 8:34 AM, Alexander Peters
>>  wrote:
>>
>> Hello
>>
>> I am currently experiencing a strange issue with my radosgw. It Fails to
>> 

Re: [ceph-users] radosgw fails with "ERROR: failed to initialize watch: (34) Numerical result out of range"

2018-01-16 Thread Brad Hubbard
On Wed, Jan 17, 2018 at 2:20 AM, Nikos Kormpakis  wrote:
> On 01/16/2018 12:53 AM, Brad Hubbard wrote:
>> On Tue, Jan 16, 2018 at 1:35 AM, Alexander Peters  wrote:
>>> i created the dump output but it looks very cryptic to me so i can't really 
>>> make much sense of it. is there anything to look for in particular?
>>
>> Yes, basically we are looking for any line that ends in "= 34". You
>> might also find piping it through c++filt helps.
>>
>> Something like...
>>
>> $ c++filt 
> Hello,
> we're facing the exact same issue. I added some more info about
> our cluster and output from ltrace in [1].

Unfortunately, the strlen lines in that output are expected.

Is it possible for me to access the ltrace output file somehow
(you could email it directly or use  ceph-post-file perhaps)?

>
> Best regards,
> Nikos.
>
> [1] http://tracker.ceph.com/issues/22351
>
>>>
>>> i think i am going to read up on how interpret ltrace output...
>>>
>>> BR
>>> Alex
>>>
>>> - Ursprüngliche Mail -
>>> Von: "Brad Hubbard" 
>>> An: "Alexander Peters" 
>>> CC: "Ceph Users" 
>>> Gesendet: Montag, 15. Januar 2018 03:09:53
>>> Betreff: Re: [ceph-users] radosgw fails with "ERROR: failed to initialize 
>>> watch: (34) Numerical result out of range"
>>>
>>> On Mon, Jan 15, 2018 at 11:38 AM, Brad Hubbard  wrote:
 On Mon, Jan 15, 2018 at 10:38 AM, Alexander Peters
  wrote:
> Thanks for the reply - unfortunatly the link you send is behind a paywall 
> so
> at least for now i can’t read it.

 That's why I provided the cause as laid out in that article (pgp num > pg 
 num).

 Do you have any settings in ceph.conf related to pg_num or pgp_num?

 If not, please add your details to http://tracker.ceph.com/issues/22351
>>>
>>> Rados can return ERANGE (34) in multiple places so identifying where
>>> might be a big step towards working this out.
>>>
>>> $ ltrace -fo /tmp/ltrace.out /usr/bin/radosgw --cluster ceph --name
>>> client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d
>>>
>>> The objective is to find which function(s) return 34.
>>>

>
> output of ceph osd dump shows that pgp num == pg num:
>
> [root@ctrl01 ~]# ceph osd dump
> epoch 142
> fsid 0e2d841f-68fd-4629-9813-ab083e8c0f10
> created 2017-12-20 23:04:59.781525
> modified 2018-01-14 21:30:57.528682
> flags sortbitwise,recovery_deletes,purged_snapdirs
> crush_version 6
> full_ratio 0.95
> backfillfull_ratio 0.9
> nearfull_ratio 0.85
> require_min_compat_client jewel
> min_compat_client jewel
> require_osd_release luminous
> pool 1 'glance' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 119 flags hashpspool 
> stripe_width
> 0 application rbd
> removed_snaps [1~3]
> pool 2 'cinder-2' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 120 flags hashpspool 
> stripe_width
> 0 application rbd
> removed_snaps [1~3]
> pool 3 'cinder-3' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 121 flags hashpspool 
> stripe_width
> 0 application rbd
> removed_snaps [1~3]
> pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins pg_num 8 pgp_num 8 last_change 94 owner 18446744073709551615 
> flags
> hashpspool stripe_width 0 application rgw
> max_osd 3
> osd.0 up   in  weight 1 up_from 82 up_thru 140 down_at 79
> last_clean_interval [23,78) 10.16.0.11:6800/1795 10.16.0.11:6801/1795
> 10.16.0.11:6802/1795 10.16.0.11:6803/1795 exists,up
> abe33844-6d98-4ede-81a8-a8bdc92dada8
> osd.1 up   in  weight 1 up_from 73 up_thru 140 down_at 71
> last_clean_interval [55,72) 10.16.0.13:6800/1756 10.16.0.13:6804/1001756
> 10.16.0.13:6805/1001756 10.16.0.13:6806/1001756 exists,up
> 0dab9372-6ffe-4a23-a8b7-4edca3745a2a
> osd.2 up   in  weight 1 up_from 140 up_thru 140 down_at 133
> last_clean_interval [31,132) 10.16.0.12:6800/1749 10.16.0.12:6801/1749
> 10.16.0.12:6802/1749 10.16.0.12:6803/1749 exists,up
> 220bba17-8119-4035-9e43-5b8eaa27562f
>
>
> Am 15.01.2018 um 01:33 schrieb Brad Hubbard :
>
> On Mon, Jan 15, 2018 at 8:34 AM, Alexander Peters
>  wrote:
>
> Hello
>
> I am currently experiencing a strange issue with my radosgw. It Fails to
> start and all tit says is:
> [root@ctrl02 ~]# /usr/bin/radosgw --cluster ceph --name
> client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d
> 2018-01-14 21:30:57.132007 7f44ddd18e00  0 deferred set uid:gid to 167:167
> (ceph:ceph)
> 2018-01-14 21:30:57.132161 7f44ddd18e00  0 

Re: [ceph-users] radosgw fails with "ERROR: failed to initialize watch: (34) Numerical result out of range"

2018-01-16 Thread Nikos Kormpakis
On 01/16/2018 12:53 AM, Brad Hubbard wrote:
> On Tue, Jan 16, 2018 at 1:35 AM, Alexander Peters  wrote:
>> i created the dump output but it looks very cryptic to me so i can't really 
>> make much sense of it. is there anything to look for in particular?
> 
> Yes, basically we are looking for any line that ends in "= 34". You
> might also find piping it through c++filt helps.
> 
> Something like...
> 
> $ c++filt http://tracker.ceph.com/issues/22351

>>
>> i think i am going to read up on how interpret ltrace output...
>>
>> BR
>> Alex
>>
>> - Ursprüngliche Mail -
>> Von: "Brad Hubbard" 
>> An: "Alexander Peters" 
>> CC: "Ceph Users" 
>> Gesendet: Montag, 15. Januar 2018 03:09:53
>> Betreff: Re: [ceph-users] radosgw fails with "ERROR: failed to initialize 
>> watch: (34) Numerical result out of range"
>>
>> On Mon, Jan 15, 2018 at 11:38 AM, Brad Hubbard  wrote:
>>> On Mon, Jan 15, 2018 at 10:38 AM, Alexander Peters
>>>  wrote:
 Thanks for the reply - unfortunatly the link you send is behind a paywall 
 so
 at least for now i can’t read it.
>>>
>>> That's why I provided the cause as laid out in that article (pgp num > pg 
>>> num).
>>>
>>> Do you have any settings in ceph.conf related to pg_num or pgp_num?
>>>
>>> If not, please add your details to http://tracker.ceph.com/issues/22351
>>
>> Rados can return ERANGE (34) in multiple places so identifying where
>> might be a big step towards working this out.
>>
>> $ ltrace -fo /tmp/ltrace.out /usr/bin/radosgw --cluster ceph --name
>> client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d
>>
>> The objective is to find which function(s) return 34.
>>
>>>

 output of ceph osd dump shows that pgp num == pg num:

 [root@ctrl01 ~]# ceph osd dump
 epoch 142
 fsid 0e2d841f-68fd-4629-9813-ab083e8c0f10
 created 2017-12-20 23:04:59.781525
 modified 2018-01-14 21:30:57.528682
 flags sortbitwise,recovery_deletes,purged_snapdirs
 crush_version 6
 full_ratio 0.95
 backfillfull_ratio 0.9
 nearfull_ratio 0.85
 require_min_compat_client jewel
 min_compat_client jewel
 require_osd_release luminous
 pool 1 'glance' replicated size 3 min_size 2 crush_rule 0 object_hash
 rjenkins pg_num 64 pgp_num 64 last_change 119 flags hashpspool stripe_width
 0 application rbd
 removed_snaps [1~3]
 pool 2 'cinder-2' replicated size 3 min_size 2 crush_rule 0 object_hash
 rjenkins pg_num 64 pgp_num 64 last_change 120 flags hashpspool stripe_width
 0 application rbd
 removed_snaps [1~3]
 pool 3 'cinder-3' replicated size 3 min_size 2 crush_rule 0 object_hash
 rjenkins pg_num 64 pgp_num 64 last_change 121 flags hashpspool stripe_width
 0 application rbd
 removed_snaps [1~3]
 pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
 rjenkins pg_num 8 pgp_num 8 last_change 94 owner 18446744073709551615 flags
 hashpspool stripe_width 0 application rgw
 max_osd 3
 osd.0 up   in  weight 1 up_from 82 up_thru 140 down_at 79
 last_clean_interval [23,78) 10.16.0.11:6800/1795 10.16.0.11:6801/1795
 10.16.0.11:6802/1795 10.16.0.11:6803/1795 exists,up
 abe33844-6d98-4ede-81a8-a8bdc92dada8
 osd.1 up   in  weight 1 up_from 73 up_thru 140 down_at 71
 last_clean_interval [55,72) 10.16.0.13:6800/1756 10.16.0.13:6804/1001756
 10.16.0.13:6805/1001756 10.16.0.13:6806/1001756 exists,up
 0dab9372-6ffe-4a23-a8b7-4edca3745a2a
 osd.2 up   in  weight 1 up_from 140 up_thru 140 down_at 133
 last_clean_interval [31,132) 10.16.0.12:6800/1749 10.16.0.12:6801/1749
 10.16.0.12:6802/1749 10.16.0.12:6803/1749 exists,up
 220bba17-8119-4035-9e43-5b8eaa27562f


 Am 15.01.2018 um 01:33 schrieb Brad Hubbard :

 On Mon, Jan 15, 2018 at 8:34 AM, Alexander Peters
  wrote:

 Hello

 I am currently experiencing a strange issue with my radosgw. It Fails to
 start and all tit says is:
 [root@ctrl02 ~]# /usr/bin/radosgw --cluster ceph --name
 client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d
 2018-01-14 21:30:57.132007 7f44ddd18e00  0 deferred set uid:gid to 167:167
 (ceph:ceph)
 2018-01-14 21:30:57.132161 7f44ddd18e00  0 ceph version 12.2.2
 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process
 (unknown), pid 13928
 2018-01-14 21:30:57.556672 7f44ddd18e00 -1 ERROR: failed to initialize
 watch: (34) Numerical result out of range
 2018-01-14 21:30:57.558752 7f44ddd18e00 -1 Couldn't init storage provider
 (RADOS)

 (when started via systemctl it writes the same lines to the logfile)

 strange thing is that it is working on an other env that was installed with
 the same set of ansible playbooks.
 OS is CentOS 

Re: [ceph-users] radosgw fails with "ERROR: failed to initialize watch: (34) Numerical result out of range"

2018-01-15 Thread Brad Hubbard
On Tue, Jan 16, 2018 at 1:35 AM, Alexander Peters  wrote:
> i created the dump output but it looks very cryptic to me so i can't really 
> make much sense of it. is there anything to look for in particular?

Yes, basically we are looking for any line that ends in "= 34". You
might also find piping it through c++filt helps.

Something like...

$ c++filt 
> i think i am going to read up on how interpret ltrace output...
>
> BR
> Alex
>
> - Ursprüngliche Mail -
> Von: "Brad Hubbard" 
> An: "Alexander Peters" 
> CC: "Ceph Users" 
> Gesendet: Montag, 15. Januar 2018 03:09:53
> Betreff: Re: [ceph-users] radosgw fails with "ERROR: failed to initialize 
> watch: (34) Numerical result out of range"
>
> On Mon, Jan 15, 2018 at 11:38 AM, Brad Hubbard  wrote:
>> On Mon, Jan 15, 2018 at 10:38 AM, Alexander Peters
>>  wrote:
>>> Thanks for the reply - unfortunatly the link you send is behind a paywall so
>>> at least for now i can’t read it.
>>
>> That's why I provided the cause as laid out in that article (pgp num > pg 
>> num).
>>
>> Do you have any settings in ceph.conf related to pg_num or pgp_num?
>>
>> If not, please add your details to http://tracker.ceph.com/issues/22351
>
> Rados can return ERANGE (34) in multiple places so identifying where
> might be a big step towards working this out.
>
> $ ltrace -fo /tmp/ltrace.out /usr/bin/radosgw --cluster ceph --name
> client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d
>
> The objective is to find which function(s) return 34.
>
>>
>>>
>>> output of ceph osd dump shows that pgp num == pg num:
>>>
>>> [root@ctrl01 ~]# ceph osd dump
>>> epoch 142
>>> fsid 0e2d841f-68fd-4629-9813-ab083e8c0f10
>>> created 2017-12-20 23:04:59.781525
>>> modified 2018-01-14 21:30:57.528682
>>> flags sortbitwise,recovery_deletes,purged_snapdirs
>>> crush_version 6
>>> full_ratio 0.95
>>> backfillfull_ratio 0.9
>>> nearfull_ratio 0.85
>>> require_min_compat_client jewel
>>> min_compat_client jewel
>>> require_osd_release luminous
>>> pool 1 'glance' replicated size 3 min_size 2 crush_rule 0 object_hash
>>> rjenkins pg_num 64 pgp_num 64 last_change 119 flags hashpspool stripe_width
>>> 0 application rbd
>>> removed_snaps [1~3]
>>> pool 2 'cinder-2' replicated size 3 min_size 2 crush_rule 0 object_hash
>>> rjenkins pg_num 64 pgp_num 64 last_change 120 flags hashpspool stripe_width
>>> 0 application rbd
>>> removed_snaps [1~3]
>>> pool 3 'cinder-3' replicated size 3 min_size 2 crush_rule 0 object_hash
>>> rjenkins pg_num 64 pgp_num 64 last_change 121 flags hashpspool stripe_width
>>> 0 application rbd
>>> removed_snaps [1~3]
>>> pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
>>> rjenkins pg_num 8 pgp_num 8 last_change 94 owner 18446744073709551615 flags
>>> hashpspool stripe_width 0 application rgw
>>> max_osd 3
>>> osd.0 up   in  weight 1 up_from 82 up_thru 140 down_at 79
>>> last_clean_interval [23,78) 10.16.0.11:6800/1795 10.16.0.11:6801/1795
>>> 10.16.0.11:6802/1795 10.16.0.11:6803/1795 exists,up
>>> abe33844-6d98-4ede-81a8-a8bdc92dada8
>>> osd.1 up   in  weight 1 up_from 73 up_thru 140 down_at 71
>>> last_clean_interval [55,72) 10.16.0.13:6800/1756 10.16.0.13:6804/1001756
>>> 10.16.0.13:6805/1001756 10.16.0.13:6806/1001756 exists,up
>>> 0dab9372-6ffe-4a23-a8b7-4edca3745a2a
>>> osd.2 up   in  weight 1 up_from 140 up_thru 140 down_at 133
>>> last_clean_interval [31,132) 10.16.0.12:6800/1749 10.16.0.12:6801/1749
>>> 10.16.0.12:6802/1749 10.16.0.12:6803/1749 exists,up
>>> 220bba17-8119-4035-9e43-5b8eaa27562f
>>>
>>>
>>> Am 15.01.2018 um 01:33 schrieb Brad Hubbard :
>>>
>>> On Mon, Jan 15, 2018 at 8:34 AM, Alexander Peters
>>>  wrote:
>>>
>>> Hello
>>>
>>> I am currently experiencing a strange issue with my radosgw. It Fails to
>>> start and all tit says is:
>>> [root@ctrl02 ~]# /usr/bin/radosgw --cluster ceph --name
>>> client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d
>>> 2018-01-14 21:30:57.132007 7f44ddd18e00  0 deferred set uid:gid to 167:167
>>> (ceph:ceph)
>>> 2018-01-14 21:30:57.132161 7f44ddd18e00  0 ceph version 12.2.2
>>> (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process
>>> (unknown), pid 13928
>>> 2018-01-14 21:30:57.556672 7f44ddd18e00 -1 ERROR: failed to initialize
>>> watch: (34) Numerical result out of range
>>> 2018-01-14 21:30:57.558752 7f44ddd18e00 -1 Couldn't init storage provider
>>> (RADOS)
>>>
>>> (when started via systemctl it writes the same lines to the logfile)
>>>
>>> strange thing is that it is working on an other env that was installed with
>>> the same set of ansible playbooks.
>>> OS is CentOS Linux release 7.4.1708 (Core)
>>>
>>> Ceph is up and running ( I am currently using it for storing volumes and
>>> images form Openstack )
>>>
>>> Does anyone have an idea how to debug this?
>>>
>>>
>>> According to 

Re: [ceph-users] radosgw fails with "ERROR: failed to initialize watch: (34) Numerical result out of range"

2018-01-15 Thread Alexander Peters
i created the dump output but it looks very cryptic to me so i can't really 
make much sense of it. is there anything to look for in particular?

i think i am going to read up on how interpret ltrace output...

BR
Alex

- Ursprüngliche Mail -
Von: "Brad Hubbard" 
An: "Alexander Peters" 
CC: "Ceph Users" 
Gesendet: Montag, 15. Januar 2018 03:09:53
Betreff: Re: [ceph-users] radosgw fails with "ERROR: failed to initialize 
watch: (34) Numerical result out of range"

On Mon, Jan 15, 2018 at 11:38 AM, Brad Hubbard  wrote:
> On Mon, Jan 15, 2018 at 10:38 AM, Alexander Peters
>  wrote:
>> Thanks for the reply - unfortunatly the link you send is behind a paywall so
>> at least for now i can’t read it.
>
> That's why I provided the cause as laid out in that article (pgp num > pg 
> num).
>
> Do you have any settings in ceph.conf related to pg_num or pgp_num?
>
> If not, please add your details to http://tracker.ceph.com/issues/22351

Rados can return ERANGE (34) in multiple places so identifying where
might be a big step towards working this out.

$ ltrace -fo /tmp/ltrace.out /usr/bin/radosgw --cluster ceph --name
client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d

The objective is to find which function(s) return 34.

>
>>
>> output of ceph osd dump shows that pgp num == pg num:
>>
>> [root@ctrl01 ~]# ceph osd dump
>> epoch 142
>> fsid 0e2d841f-68fd-4629-9813-ab083e8c0f10
>> created 2017-12-20 23:04:59.781525
>> modified 2018-01-14 21:30:57.528682
>> flags sortbitwise,recovery_deletes,purged_snapdirs
>> crush_version 6
>> full_ratio 0.95
>> backfillfull_ratio 0.9
>> nearfull_ratio 0.85
>> require_min_compat_client jewel
>> min_compat_client jewel
>> require_osd_release luminous
>> pool 1 'glance' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 119 flags hashpspool stripe_width
>> 0 application rbd
>> removed_snaps [1~3]
>> pool 2 'cinder-2' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 120 flags hashpspool stripe_width
>> 0 application rbd
>> removed_snaps [1~3]
>> pool 3 'cinder-3' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 121 flags hashpspool stripe_width
>> 0 application rbd
>> removed_snaps [1~3]
>> pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 8 pgp_num 8 last_change 94 owner 18446744073709551615 flags
>> hashpspool stripe_width 0 application rgw
>> max_osd 3
>> osd.0 up   in  weight 1 up_from 82 up_thru 140 down_at 79
>> last_clean_interval [23,78) 10.16.0.11:6800/1795 10.16.0.11:6801/1795
>> 10.16.0.11:6802/1795 10.16.0.11:6803/1795 exists,up
>> abe33844-6d98-4ede-81a8-a8bdc92dada8
>> osd.1 up   in  weight 1 up_from 73 up_thru 140 down_at 71
>> last_clean_interval [55,72) 10.16.0.13:6800/1756 10.16.0.13:6804/1001756
>> 10.16.0.13:6805/1001756 10.16.0.13:6806/1001756 exists,up
>> 0dab9372-6ffe-4a23-a8b7-4edca3745a2a
>> osd.2 up   in  weight 1 up_from 140 up_thru 140 down_at 133
>> last_clean_interval [31,132) 10.16.0.12:6800/1749 10.16.0.12:6801/1749
>> 10.16.0.12:6802/1749 10.16.0.12:6803/1749 exists,up
>> 220bba17-8119-4035-9e43-5b8eaa27562f
>>
>>
>> Am 15.01.2018 um 01:33 schrieb Brad Hubbard :
>>
>> On Mon, Jan 15, 2018 at 8:34 AM, Alexander Peters
>>  wrote:
>>
>> Hello
>>
>> I am currently experiencing a strange issue with my radosgw. It Fails to
>> start and all tit says is:
>> [root@ctrl02 ~]# /usr/bin/radosgw --cluster ceph --name
>> client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d
>> 2018-01-14 21:30:57.132007 7f44ddd18e00  0 deferred set uid:gid to 167:167
>> (ceph:ceph)
>> 2018-01-14 21:30:57.132161 7f44ddd18e00  0 ceph version 12.2.2
>> (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process
>> (unknown), pid 13928
>> 2018-01-14 21:30:57.556672 7f44ddd18e00 -1 ERROR: failed to initialize
>> watch: (34) Numerical result out of range
>> 2018-01-14 21:30:57.558752 7f44ddd18e00 -1 Couldn't init storage provider
>> (RADOS)
>>
>> (when started via systemctl it writes the same lines to the logfile)
>>
>> strange thing is that it is working on an other env that was installed with
>> the same set of ansible playbooks.
>> OS is CentOS Linux release 7.4.1708 (Core)
>>
>> Ceph is up and running ( I am currently using it for storing volumes and
>> images form Openstack )
>>
>> Does anyone have an idea how to debug this?
>>
>>
>> According to https://access.redhat.com/solutions/2778161 this can
>> happen if your pgp num is higher than the pg num.
>>
>> Check "ceph osd dump" output for that possibility.
>>
>>
>> Best Regards
>> Alexander
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> 

Re: [ceph-users] radosgw fails with "ERROR: failed to initialize watch: (34) Numerical result out of range"

2018-01-14 Thread Brad Hubbard
On Mon, Jan 15, 2018 at 11:38 AM, Brad Hubbard  wrote:
> On Mon, Jan 15, 2018 at 10:38 AM, Alexander Peters
>  wrote:
>> Thanks for the reply - unfortunatly the link you send is behind a paywall so
>> at least for now i can’t read it.
>
> That's why I provided the cause as laid out in that article (pgp num > pg 
> num).
>
> Do you have any settings in ceph.conf related to pg_num or pgp_num?
>
> If not, please add your details to http://tracker.ceph.com/issues/22351

Rados can return ERANGE (34) in multiple places so identifying where
might be a big step towards working this out.

$ ltrace -fo /tmp/ltrace.out /usr/bin/radosgw --cluster ceph --name
client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d

The objective is to find which function(s) return 34.

>
>>
>> output of ceph osd dump shows that pgp num == pg num:
>>
>> [root@ctrl01 ~]# ceph osd dump
>> epoch 142
>> fsid 0e2d841f-68fd-4629-9813-ab083e8c0f10
>> created 2017-12-20 23:04:59.781525
>> modified 2018-01-14 21:30:57.528682
>> flags sortbitwise,recovery_deletes,purged_snapdirs
>> crush_version 6
>> full_ratio 0.95
>> backfillfull_ratio 0.9
>> nearfull_ratio 0.85
>> require_min_compat_client jewel
>> min_compat_client jewel
>> require_osd_release luminous
>> pool 1 'glance' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 119 flags hashpspool stripe_width
>> 0 application rbd
>> removed_snaps [1~3]
>> pool 2 'cinder-2' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 120 flags hashpspool stripe_width
>> 0 application rbd
>> removed_snaps [1~3]
>> pool 3 'cinder-3' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 121 flags hashpspool stripe_width
>> 0 application rbd
>> removed_snaps [1~3]
>> pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 8 pgp_num 8 last_change 94 owner 18446744073709551615 flags
>> hashpspool stripe_width 0 application rgw
>> max_osd 3
>> osd.0 up   in  weight 1 up_from 82 up_thru 140 down_at 79
>> last_clean_interval [23,78) 10.16.0.11:6800/1795 10.16.0.11:6801/1795
>> 10.16.0.11:6802/1795 10.16.0.11:6803/1795 exists,up
>> abe33844-6d98-4ede-81a8-a8bdc92dada8
>> osd.1 up   in  weight 1 up_from 73 up_thru 140 down_at 71
>> last_clean_interval [55,72) 10.16.0.13:6800/1756 10.16.0.13:6804/1001756
>> 10.16.0.13:6805/1001756 10.16.0.13:6806/1001756 exists,up
>> 0dab9372-6ffe-4a23-a8b7-4edca3745a2a
>> osd.2 up   in  weight 1 up_from 140 up_thru 140 down_at 133
>> last_clean_interval [31,132) 10.16.0.12:6800/1749 10.16.0.12:6801/1749
>> 10.16.0.12:6802/1749 10.16.0.12:6803/1749 exists,up
>> 220bba17-8119-4035-9e43-5b8eaa27562f
>>
>>
>> Am 15.01.2018 um 01:33 schrieb Brad Hubbard :
>>
>> On Mon, Jan 15, 2018 at 8:34 AM, Alexander Peters
>>  wrote:
>>
>> Hello
>>
>> I am currently experiencing a strange issue with my radosgw. It Fails to
>> start and all tit says is:
>> [root@ctrl02 ~]# /usr/bin/radosgw --cluster ceph --name
>> client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d
>> 2018-01-14 21:30:57.132007 7f44ddd18e00  0 deferred set uid:gid to 167:167
>> (ceph:ceph)
>> 2018-01-14 21:30:57.132161 7f44ddd18e00  0 ceph version 12.2.2
>> (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process
>> (unknown), pid 13928
>> 2018-01-14 21:30:57.556672 7f44ddd18e00 -1 ERROR: failed to initialize
>> watch: (34) Numerical result out of range
>> 2018-01-14 21:30:57.558752 7f44ddd18e00 -1 Couldn't init storage provider
>> (RADOS)
>>
>> (when started via systemctl it writes the same lines to the logfile)
>>
>> strange thing is that it is working on an other env that was installed with
>> the same set of ansible playbooks.
>> OS is CentOS Linux release 7.4.1708 (Core)
>>
>> Ceph is up and running ( I am currently using it for storing volumes and
>> images form Openstack )
>>
>> Does anyone have an idea how to debug this?
>>
>>
>> According to https://access.redhat.com/solutions/2778161 this can
>> happen if your pgp num is higher than the pg num.
>>
>> Check "ceph osd dump" output for that possibility.
>>
>>
>> Best Regards
>> Alexander
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>>
>> --
>> Cheers,
>> Brad
>>
>>
>
>
>
> --
> Cheers,
> Brad



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw fails with "ERROR: failed to initialize watch: (34) Numerical result out of range"

2018-01-14 Thread Brad Hubbard
On Mon, Jan 15, 2018 at 10:38 AM, Alexander Peters
 wrote:
> Thanks for the reply - unfortunatly the link you send is behind a paywall so
> at least for now i can’t read it.

That's why I provided the cause as laid out in that article (pgp num > pg num).

Do you have any settings in ceph.conf related to pg_num or pgp_num?

If not, please add your details to http://tracker.ceph.com/issues/22351

>
> output of ceph osd dump shows that pgp num == pg num:
>
> [root@ctrl01 ~]# ceph osd dump
> epoch 142
> fsid 0e2d841f-68fd-4629-9813-ab083e8c0f10
> created 2017-12-20 23:04:59.781525
> modified 2018-01-14 21:30:57.528682
> flags sortbitwise,recovery_deletes,purged_snapdirs
> crush_version 6
> full_ratio 0.95
> backfillfull_ratio 0.9
> nearfull_ratio 0.85
> require_min_compat_client jewel
> min_compat_client jewel
> require_osd_release luminous
> pool 1 'glance' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 119 flags hashpspool stripe_width
> 0 application rbd
> removed_snaps [1~3]
> pool 2 'cinder-2' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 120 flags hashpspool stripe_width
> 0 application rbd
> removed_snaps [1~3]
> pool 3 'cinder-3' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 121 flags hashpspool stripe_width
> 0 application rbd
> removed_snaps [1~3]
> pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins pg_num 8 pgp_num 8 last_change 94 owner 18446744073709551615 flags
> hashpspool stripe_width 0 application rgw
> max_osd 3
> osd.0 up   in  weight 1 up_from 82 up_thru 140 down_at 79
> last_clean_interval [23,78) 10.16.0.11:6800/1795 10.16.0.11:6801/1795
> 10.16.0.11:6802/1795 10.16.0.11:6803/1795 exists,up
> abe33844-6d98-4ede-81a8-a8bdc92dada8
> osd.1 up   in  weight 1 up_from 73 up_thru 140 down_at 71
> last_clean_interval [55,72) 10.16.0.13:6800/1756 10.16.0.13:6804/1001756
> 10.16.0.13:6805/1001756 10.16.0.13:6806/1001756 exists,up
> 0dab9372-6ffe-4a23-a8b7-4edca3745a2a
> osd.2 up   in  weight 1 up_from 140 up_thru 140 down_at 133
> last_clean_interval [31,132) 10.16.0.12:6800/1749 10.16.0.12:6801/1749
> 10.16.0.12:6802/1749 10.16.0.12:6803/1749 exists,up
> 220bba17-8119-4035-9e43-5b8eaa27562f
>
>
> Am 15.01.2018 um 01:33 schrieb Brad Hubbard :
>
> On Mon, Jan 15, 2018 at 8:34 AM, Alexander Peters
>  wrote:
>
> Hello
>
> I am currently experiencing a strange issue with my radosgw. It Fails to
> start and all tit says is:
> [root@ctrl02 ~]# /usr/bin/radosgw --cluster ceph --name
> client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d
> 2018-01-14 21:30:57.132007 7f44ddd18e00  0 deferred set uid:gid to 167:167
> (ceph:ceph)
> 2018-01-14 21:30:57.132161 7f44ddd18e00  0 ceph version 12.2.2
> (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process
> (unknown), pid 13928
> 2018-01-14 21:30:57.556672 7f44ddd18e00 -1 ERROR: failed to initialize
> watch: (34) Numerical result out of range
> 2018-01-14 21:30:57.558752 7f44ddd18e00 -1 Couldn't init storage provider
> (RADOS)
>
> (when started via systemctl it writes the same lines to the logfile)
>
> strange thing is that it is working on an other env that was installed with
> the same set of ansible playbooks.
> OS is CentOS Linux release 7.4.1708 (Core)
>
> Ceph is up and running ( I am currently using it for storing volumes and
> images form Openstack )
>
> Does anyone have an idea how to debug this?
>
>
> According to https://access.redhat.com/solutions/2778161 this can
> happen if your pgp num is higher than the pg num.
>
> Check "ceph osd dump" output for that possibility.
>
>
> Best Regards
> Alexander
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> --
> Cheers,
> Brad
>
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw fails with "ERROR: failed to initialize watch: (34) Numerical result out of range"

2018-01-14 Thread Alexander Peters
Thanks for the reply - unfortunatly the link you send is behind a paywall so at 
least for now i can’t read it.

output of ceph osd dump shows that pgp num == pg num:

[root@ctrl01 ~]# ceph osd dump
epoch 142
fsid 0e2d841f-68fd-4629-9813-ab083e8c0f10
created 2017-12-20 23:04:59.781525
modified 2018-01-14 21:30:57.528682
flags sortbitwise,recovery_deletes,purged_snapdirs
crush_version 6
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
min_compat_client jewel
require_osd_release luminous
pool 1 'glance' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 119 flags hashpspool stripe_width 0 
application rbd
removed_snaps [1~3]
pool 2 'cinder-2' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 120 flags hashpspool stripe_width 0 
application rbd
removed_snaps [1~3]
pool 3 'cinder-3' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 121 flags hashpspool stripe_width 0 
application rbd
removed_snaps [1~3]
pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 8 pgp_num 8 last_change 94 owner 18446744073709551615 flags 
hashpspool stripe_width 0 application rgw
max_osd 3
osd.0 up   in  weight 1 up_from 82 up_thru 140 down_at 79 last_clean_interval 
[23,78) 10.16.0.11:6800/1795 10.16.0.11:6801/1795 10.16.0.11:6802/1795 
10.16.0.11:6803/1795 exists,up abe33844-6d98-4ede-81a8-a8bdc92dada8
osd.1 up   in  weight 1 up_from 73 up_thru 140 down_at 71 last_clean_interval 
[55,72) 10.16.0.13:6800/1756 10.16.0.13:6804/1001756 10.16.0.13:6805/1001756 
10.16.0.13:6806/1001756 exists,up 0dab9372-6ffe-4a23-a8b7-4edca3745a2a
osd.2 up   in  weight 1 up_from 140 up_thru 140 down_at 133 last_clean_interval 
[31,132) 10.16.0.12:6800/1749 10.16.0.12:6801/1749 10.16.0.12:6802/1749 
10.16.0.12:6803/1749 exists,up 220bba17-8119-4035-9e43-5b8eaa27562f


> Am 15.01.2018 um 01:33 schrieb Brad Hubbard :
> 
> On Mon, Jan 15, 2018 at 8:34 AM, Alexander Peters
> > wrote:
>> Hello
>> 
>> I am currently experiencing a strange issue with my radosgw. It Fails to 
>> start and all tit says is:
>> [root@ctrl02 ~]# /usr/bin/radosgw --cluster ceph --name 
>> client.radosgw.ctrl02 --setuser ceph --setgroup ceph -f -d
>> 2018-01-14 21:30:57.132007 7f44ddd18e00  0 deferred set uid:gid to 167:167 
>> (ceph:ceph)
>> 2018-01-14 21:30:57.132161 7f44ddd18e00  0 ceph version 12.2.2 
>> (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process 
>> (unknown), pid 13928
>> 2018-01-14 21:30:57.556672 7f44ddd18e00 -1 ERROR: failed to initialize 
>> watch: (34) Numerical result out of range
>> 2018-01-14 21:30:57.558752 7f44ddd18e00 -1 Couldn't init storage provider 
>> (RADOS)
>> 
>> (when started via systemctl it writes the same lines to the logfile)
>> 
>> strange thing is that it is working on an other env that was installed with 
>> the same set of ansible playbooks.
>> OS is CentOS Linux release 7.4.1708 (Core)
>> 
>> Ceph is up and running ( I am currently using it for storing volumes and 
>> images form Openstack )
>> 
>> Does anyone have an idea how to debug this?
> 
> According to https://access.redhat.com/solutions/2778161 
>  this can
> happen if your pgp num is higher than the pg num.
> 
> Check "ceph osd dump" output for that possibility.
> 
>> 
>> Best Regards
>> Alexander
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
>> 
> 
> 
> 
> --
> Cheers,
> Brad



signature.asc
Description: Message signed with OpenPGP
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw fails with "ERROR: failed to initialize watch: (34) Numerical result out of range"

2018-01-14 Thread Brad Hubbard
On Mon, Jan 15, 2018 at 8:34 AM, Alexander Peters
 wrote:
> Hello
>
> I am currently experiencing a strange issue with my radosgw. It Fails to 
> start and all tit says is:
> [root@ctrl02 ~]# /usr/bin/radosgw --cluster ceph --name client.radosgw.ctrl02 
> --setuser ceph --setgroup ceph -f -d
> 2018-01-14 21:30:57.132007 7f44ddd18e00  0 deferred set uid:gid to 167:167 
> (ceph:ceph)
> 2018-01-14 21:30:57.132161 7f44ddd18e00  0 ceph version 12.2.2 
> (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process 
> (unknown), pid 13928
> 2018-01-14 21:30:57.556672 7f44ddd18e00 -1 ERROR: failed to initialize watch: 
> (34) Numerical result out of range
> 2018-01-14 21:30:57.558752 7f44ddd18e00 -1 Couldn't init storage provider 
> (RADOS)
>
> (when started via systemctl it writes the same lines to the logfile)
>
> strange thing is that it is working on an other env that was installed with 
> the same set of ansible playbooks.
> OS is CentOS Linux release 7.4.1708 (Core)
>
> Ceph is up and running ( I am currently using it for storing volumes and 
> images form Openstack )
>
> Does anyone have an idea how to debug this?

According to https://access.redhat.com/solutions/2778161 this can
happen if your pgp num is higher than the pg num.

Check "ceph osd dump" output for that possibility.

>
> Best Regards
> Alexander
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com