Re: [ceph-users] OSD Segfaults after Bluestore conversion

2018-08-27 Thread Tyler Bishop
Okay so far since switching back it looks more stable.  I have around 2GB/s
and 100k iops flowing with FIO atm to test.
_



On Mon, Aug 27, 2018 at 11:06 PM Adam Tygart  wrote:

> This issue was related to using Jemalloc. Jemalloc is not as well
> tested with Bluestore and lead to lots of segfaults. We moved back to
> the default of tcmalloc with Bluestore and these stopped.
>
> Check /etc/sysconfig/ceph under RHEL based distros.
>
> --
> Adam
> On Mon, Aug 27, 2018 at 9:51 PM Tyler Bishop
>  wrote:
> >
> > Did you solve this?  Similar issue.
> > _
> >
> >
> > On Wed, Feb 28, 2018 at 3:46 PM Kyle Hutson  wrote:
> >>
> >> I'm following up from awhile ago. I don't think this is the same bug.
> The bug referenced shows "abort: Corruption: block checksum mismatch", and
> I'm not seeing that on mine.
> >>
> >> Now I've had 8 OSDs down on this one server for a couple of weeks, and
> I just tried to start it back up. Here's a link to the log of that OSD
> (which segfaulted right after starting up):
> http://people.beocat.ksu.edu/~kylehutson/ceph-osd.414.log
> >>
> >> To me, it looks like the logs are providing surprisingly few hints as
> to where the problem lies. Is there a way I can turn up logging to see if I
> can get any more info as to why this is happening?
> >>
> >> On Thu, Feb 8, 2018 at 3:02 AM, Mike O'Connor  wrote:
> >>>
> >>> On 7/02/2018 8:23 AM, Kyle Hutson wrote:
> >>> > We had a 26-node production ceph cluster which we upgraded to
> Luminous
> >>> > a little over a month ago. I added a 27th-node with Bluestore and
> >>> > didn't have any issues, so I began converting the others, one at a
> >>> > time. The first two went off pretty smoothly, but the 3rd is doing
> >>> > something strange.
> >>> >
> >>> > Initially, all the OSDs came up fine, but then some started to
> >>> > segfault. Out of curiosity more than anything else, I did reboot the
> >>> > server to see if it would get better or worse, and it pretty much
> >>> > stayed the same - 12 of the 18 OSDs did not properly come up. Of
> >>> > those, 3 again segfaulted
> >>> >
> >>> > I picked one that didn't properly come up and copied the log to where
> >>> > anybody can view it:
> >>> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.426.log
> >>> > 
> >>> >
> >>> > You can contrast that with one that is up:
> >>> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.428.log
> >>> > 
> >>> >
> >>> > (which is still showing segfaults in the logs, but seems to be
> >>> > recovering from them OK?)
> >>> >
> >>> > Any ideas?
> >>> Ideas ? yes
> >>>
> >>> There is a a bug which is hitting a small number of systems and at this
> >>> time there is no solution. Issues details at
> >>> http://tracker.ceph.com/issues/22102.
> >>>
> >>> Please submit more details of your problem on the ticket.
> >>>
> >>> Mike
> >>>
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD Segfaults after Bluestore conversion

2018-08-27 Thread Adam Tygart
This issue was related to using Jemalloc. Jemalloc is not as well
tested with Bluestore and lead to lots of segfaults. We moved back to
the default of tcmalloc with Bluestore and these stopped.

Check /etc/sysconfig/ceph under RHEL based distros.

--
Adam
On Mon, Aug 27, 2018 at 9:51 PM Tyler Bishop
 wrote:
>
> Did you solve this?  Similar issue.
> _
>
>
> On Wed, Feb 28, 2018 at 3:46 PM Kyle Hutson  wrote:
>>
>> I'm following up from awhile ago. I don't think this is the same bug. The 
>> bug referenced shows "abort: Corruption: block checksum mismatch", and I'm 
>> not seeing that on mine.
>>
>> Now I've had 8 OSDs down on this one server for a couple of weeks, and I 
>> just tried to start it back up. Here's a link to the log of that OSD (which 
>> segfaulted right after starting up): 
>> http://people.beocat.ksu.edu/~kylehutson/ceph-osd.414.log
>>
>> To me, it looks like the logs are providing surprisingly few hints as to 
>> where the problem lies. Is there a way I can turn up logging to see if I can 
>> get any more info as to why this is happening?
>>
>> On Thu, Feb 8, 2018 at 3:02 AM, Mike O'Connor  wrote:
>>>
>>> On 7/02/2018 8:23 AM, Kyle Hutson wrote:
>>> > We had a 26-node production ceph cluster which we upgraded to Luminous
>>> > a little over a month ago. I added a 27th-node with Bluestore and
>>> > didn't have any issues, so I began converting the others, one at a
>>> > time. The first two went off pretty smoothly, but the 3rd is doing
>>> > something strange.
>>> >
>>> > Initially, all the OSDs came up fine, but then some started to
>>> > segfault. Out of curiosity more than anything else, I did reboot the
>>> > server to see if it would get better or worse, and it pretty much
>>> > stayed the same - 12 of the 18 OSDs did not properly come up. Of
>>> > those, 3 again segfaulted
>>> >
>>> > I picked one that didn't properly come up and copied the log to where
>>> > anybody can view it:
>>> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.426.log
>>> > 
>>> >
>>> > You can contrast that with one that is up:
>>> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.428.log
>>> > 
>>> >
>>> > (which is still showing segfaults in the logs, but seems to be
>>> > recovering from them OK?)
>>> >
>>> > Any ideas?
>>> Ideas ? yes
>>>
>>> There is a a bug which is hitting a small number of systems and at this
>>> time there is no solution. Issues details at
>>> http://tracker.ceph.com/issues/22102.
>>>
>>> Please submit more details of your problem on the ticket.
>>>
>>> Mike
>>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD Segfaults after Bluestore conversion

2018-08-27 Thread Tyler Bishop
Did you solve this?  Similar issue.
_


On Wed, Feb 28, 2018 at 3:46 PM Kyle Hutson  wrote:

> I'm following up from awhile ago. I don't think this is the same bug. The
> bug referenced shows "abort: Corruption: block checksum mismatch", and I'm
> not seeing that on mine.
>
> Now I've had 8 OSDs down on this one server for a couple of weeks, and I
> just tried to start it back up. Here's a link to the log of that OSD (which
> segfaulted right after starting up):
> http://people.beocat.ksu.edu/~kylehutson/ceph-osd.414.log
>
> To me, it looks like the logs are providing surprisingly few hints as to
> where the problem lies. Is there a way I can turn up logging to see if I
> can get any more info as to why this is happening?
>
> On Thu, Feb 8, 2018 at 3:02 AM, Mike O'Connor  wrote:
>
>> On 7/02/2018 8:23 AM, Kyle Hutson wrote:
>> > We had a 26-node production ceph cluster which we upgraded to Luminous
>> > a little over a month ago. I added a 27th-node with Bluestore and
>> > didn't have any issues, so I began converting the others, one at a
>> > time. The first two went off pretty smoothly, but the 3rd is doing
>> > something strange.
>> >
>> > Initially, all the OSDs came up fine, but then some started to
>> > segfault. Out of curiosity more than anything else, I did reboot the
>> > server to see if it would get better or worse, and it pretty much
>> > stayed the same - 12 of the 18 OSDs did not properly come up. Of
>> > those, 3 again segfaulted
>> >
>> > I picked one that didn't properly come up and copied the log to where
>> > anybody can view it:
>> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.426.log
>> > 
>> >
>> > You can contrast that with one that is up:
>> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.428.log
>> > 
>> >
>> > (which is still showing segfaults in the logs, but seems to be
>> > recovering from them OK?)
>> >
>> > Any ideas?
>> Ideas ? yes
>>
>> There is a a bug which is hitting a small number of systems and at this
>> time there is no solution. Issues details at
>> http://tracker.ceph.com/issues/22102.
>>
>> Please submit more details of your problem on the ticket.
>>
>> Mike
>>
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD Segfaults after Bluestore conversion

2018-02-28 Thread Kyle Hutson
I'm following up from awhile ago. I don't think this is the same bug. The
bug referenced shows "abort: Corruption: block checksum mismatch", and I'm
not seeing that on mine.

Now I've had 8 OSDs down on this one server for a couple of weeks, and I
just tried to start it back up. Here's a link to the log of that OSD (which
segfaulted right after starting up):
http://people.beocat.ksu.edu/~kylehutson/ceph-osd.414.log

To me, it looks like the logs are providing surprisingly few hints as to
where the problem lies. Is there a way I can turn up logging to see if I
can get any more info as to why this is happening?

On Thu, Feb 8, 2018 at 3:02 AM, Mike O'Connor  wrote:

> On 7/02/2018 8:23 AM, Kyle Hutson wrote:
> > We had a 26-node production ceph cluster which we upgraded to Luminous
> > a little over a month ago. I added a 27th-node with Bluestore and
> > didn't have any issues, so I began converting the others, one at a
> > time. The first two went off pretty smoothly, but the 3rd is doing
> > something strange.
> >
> > Initially, all the OSDs came up fine, but then some started to
> > segfault. Out of curiosity more than anything else, I did reboot the
> > server to see if it would get better or worse, and it pretty much
> > stayed the same - 12 of the 18 OSDs did not properly come up. Of
> > those, 3 again segfaulted
> >
> > I picked one that didn't properly come up and copied the log to where
> > anybody can view it:
> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.426.log
> > 
> >
> > You can contrast that with one that is up:
> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.428.log
> > 
> >
> > (which is still showing segfaults in the logs, but seems to be
> > recovering from them OK?)
> >
> > Any ideas?
> Ideas ? yes
>
> There is a a bug which is hitting a small number of systems and at this
> time there is no solution. Issues details at
> http://tracker.ceph.com/issues/22102.
>
> Please submit more details of your problem on the ticket.
>
> Mike
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD Segfaults after Bluestore conversion

2018-02-08 Thread Mike O'Connor
On 7/02/2018 8:23 AM, Kyle Hutson wrote:
> We had a 26-node production ceph cluster which we upgraded to Luminous
> a little over a month ago. I added a 27th-node with Bluestore and
> didn't have any issues, so I began converting the others, one at a
> time. The first two went off pretty smoothly, but the 3rd is doing
> something strange.
>
> Initially, all the OSDs came up fine, but then some started to
> segfault. Out of curiosity more than anything else, I did reboot the
> server to see if it would get better or worse, and it pretty much
> stayed the same - 12 of the 18 OSDs did not properly come up. Of
> those, 3 again segfaulted
>
> I picked one that didn't properly come up and copied the log to where
> anybody can view it:
> http://people.beocat.ksu.edu/~kylehutson/ceph-osd.426.log
> 
>
> You can contrast that with one that is up:
> http://people.beocat.ksu.edu/~kylehutson/ceph-osd.428.log
> 
>
> (which is still showing segfaults in the logs, but seems to be
> recovering from them OK?)
>
> Any ideas?
Ideas ? yes

There is a a bug which is hitting a small number of systems and at this
time there is no solution. Issues details at
http://tracker.ceph.com/issues/22102.

Please submit more details of your problem on the ticket.

Mike

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD Segfaults after Bluestore conversion

2018-02-06 Thread Kyle Hutson
We had a 26-node production ceph cluster which we upgraded to Luminous a
little over a month ago. I added a 27th-node with Bluestore and didn't have
any issues, so I began converting the others, one at a time. The first two
went off pretty smoothly, but the 3rd is doing something strange.

Initially, all the OSDs came up fine, but then some started to segfault.
Out of curiosity more than anything else, I did reboot the server to see if
it would get better or worse, and it pretty much stayed the same - 12 of
the 18 OSDs did not properly come up. Of those, 3 again segfaulted

I picked one that didn't properly come up and copied the log to where
anybody can view it:
http://people.beocat.ksu.edu/~kylehutson/ceph-osd.426.log

You can contrast that with one that is up:
http://people.beocat.ksu.edu/~kylehutson/ceph-osd.428.log

(which is still showing segfaults in the logs, but seems to be recovering
from them OK?)

Any ideas?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com