Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?

2018-01-21 Thread Youzhong Yang
Thanks. I applied the workaround to .vmx and rebooted all VMs. No more
freeze!

On Sun, Jan 21, 2018 at 3:43 PM, Nick Fisk <n...@fisk.me.uk> wrote:

> How up to date is your VM environment? We saw something very similar last
> year with Linux VM’s running newish kernels. It turns out newer kernels
> supported a new feature of the vmxnet3 adapters which had a bug in ESXi.
> The fix was release last year some time in ESXi6.5 U1, or a workaround was
> to set an option in the VM config.
>
>
>
> https://kb.vmware.com/s/article/2151480
>
>
>
>
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Youzhong Yang
> *Sent:* 21 January 2018 19:50
> *To:* Brad Hubbard <bhubb...@redhat.com>
> *Cc:* ceph-users <ceph-users@lists.ceph.com>
> *Subject:* Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous =
> random OS hang ?
>
>
>
> As someone suggested, I installed linux-generic-hwe-16.04 package on
> Ubuntu 16.04 to get kernel of 17.10, and then rebooted all VMs, here is
> what I observed:
>
> - ceph monitor node froze upon reboot, in another case froze after a few
> minutes
>
> - ceph OSD hosts easily froze
>
> - ceph admin node (which runs no ceph service but ceph-deploy) never
> freezes
>
> - ceph rgw nodes and ceph mgr so far so good
>
>
>
> Here are two images I captured:
>
>
>
> https://drive.google.com/file/d/11hMJwhCF6Tj8LD3nlpokG0CB_
> oZqI506/view?usp=sharing
>
> https://drive.google.com/file/d/1tzDQ3DYTnfDHh_
> hTQb0ISZZ4WZdRxHLv/view?usp=sharing
>
>
>
> Thanks.
>
>
>
> On Sat, Jan 20, 2018 at 7:03 PM, Brad Hubbard <bhubb...@redhat.com> wrote:
>
> On Fri, Jan 19, 2018 at 11:54 PM, Youzhong Yang <youzh...@gmail.com>
> wrote:
> > I don't think it's hardware issue. All the hosts are VMs. By the way,
> using
> > the same set of VMWare hypervisors, I switched back to Ubuntu 16.04 last
> > night, so far so good, no freeze.
>
> Too little information to make any sort of assessment I'm afraid but,
> at this stage, this doesn't sound like a ceph issue.
>
>
> >
> > On Fri, Jan 19, 2018 at 8:50 AM, Daniel Baumann <daniel.baum...@bfh.ch>
> > wrote:
> >>
> >> Hi,
> >>
> >> On 01/19/18 14:46, Youzhong Yang wrote:
> >> > Just wondering if anyone has seen the same issue, or it's just me.
> >>
> >> we're using debian with our own backported kernels and ceph, works rock
> >> solid.
> >>
> >> what you're describing sounds more like hardware issues to me. if you
> >> don't fully "trust"/have confidence in your hardware (and your logs
> >> don't reveal anything), I'd recommend running some burn-in tests
> >> (memtest, cpuburn, etc.) on them for 24 hours/machine to rule out
> >> cpu/ram/etc. issues.
> >>
> >> Regards,
> >> Daniel
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
> --
> Cheers,
> Brad
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?

2018-01-21 Thread Nick Fisk
How up to date is your VM environment? We saw something very similar last year 
with Linux VM’s running newish kernels. It turns out newer kernels supported a 
new feature of the vmxnet3 adapters which had a bug in ESXi. The fix was 
release last year some time in ESXi6.5 U1, or a workaround was to set an option 
in the VM config.

 

https://kb.vmware.com/s/article/2151480

 

 

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Youzhong Yang
Sent: 21 January 2018 19:50
To: Brad Hubbard <bhubb...@redhat.com>
Cc: ceph-users <ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS 
hang ?

 

As someone suggested, I installed linux-generic-hwe-16.04 package on Ubuntu 
16.04 to get kernel of 17.10, and then rebooted all VMs, here is what I 
observed:

- ceph monitor node froze upon reboot, in another case froze after a few 
minutes 

- ceph OSD hosts easily froze

- ceph admin node (which runs no ceph service but ceph-deploy) never freezes

- ceph rgw nodes and ceph mgr so far so good

 

Here are two images I captured:

 

https://drive.google.com/file/d/11hMJwhCF6Tj8LD3nlpokG0CB_oZqI506/view?usp=sharing

https://drive.google.com/file/d/1tzDQ3DYTnfDHh_hTQb0ISZZ4WZdRxHLv/view?usp=sharing

 

Thanks.

 

On Sat, Jan 20, 2018 at 7:03 PM, Brad Hubbard <bhubb...@redhat.com 
<mailto:bhubb...@redhat.com> > wrote:

On Fri, Jan 19, 2018 at 11:54 PM, Youzhong Yang <youzh...@gmail.com 
<mailto:youzh...@gmail.com> > wrote:
> I don't think it's hardware issue. All the hosts are VMs. By the way, using
> the same set of VMWare hypervisors, I switched back to Ubuntu 16.04 last
> night, so far so good, no freeze.

Too little information to make any sort of assessment I'm afraid but,
at this stage, this doesn't sound like a ceph issue.


>
> On Fri, Jan 19, 2018 at 8:50 AM, Daniel Baumann <daniel.baum...@bfh.ch 
> <mailto:daniel.baum...@bfh.ch> >
> wrote:
>>
>> Hi,
>>
>> On 01/19/18 14:46, Youzhong Yang wrote:
>> > Just wondering if anyone has seen the same issue, or it's just me.
>>
>> we're using debian with our own backported kernels and ceph, works rock
>> solid.
>>
>> what you're describing sounds more like hardware issues to me. if you
>> don't fully "trust"/have confidence in your hardware (and your logs
>> don't reveal anything), I'd recommend running some burn-in tests
>> (memtest, cpuburn, etc.) on them for 24 hours/machine to rule out
>> cpu/ram/etc. issues.
>>
>> Regards,
>> Daniel
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>




--
Cheers,
Brad

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?

2018-01-21 Thread Youzhong Yang
As someone suggested, I installed linux-generic-hwe-16.04 package on Ubuntu
16.04 to get kernel of 17.10, and then rebooted all VMs, here is what I
observed:
- ceph monitor node froze upon reboot, in another case froze after a few
minutes
- ceph OSD hosts easily froze
- ceph admin node (which runs no ceph service but ceph-deploy) never freezes
- ceph rgw nodes and ceph mgr so far so good

Here are two images I captured:

https://drive.google.com/file/d/11hMJwhCF6Tj8LD3nlpokG0CB_oZqI506/view?usp=sharing
https://drive.google.com/file/d/1tzDQ3DYTnfDHh_hTQb0ISZZ4WZdRxHLv/view?usp=sharing

Thanks.

On Sat, Jan 20, 2018 at 7:03 PM, Brad Hubbard  wrote:

> On Fri, Jan 19, 2018 at 11:54 PM, Youzhong Yang 
> wrote:
> > I don't think it's hardware issue. All the hosts are VMs. By the way,
> using
> > the same set of VMWare hypervisors, I switched back to Ubuntu 16.04 last
> > night, so far so good, no freeze.
>
> Too little information to make any sort of assessment I'm afraid but,
> at this stage, this doesn't sound like a ceph issue.
>
> >
> > On Fri, Jan 19, 2018 at 8:50 AM, Daniel Baumann 
> > wrote:
> >>
> >> Hi,
> >>
> >> On 01/19/18 14:46, Youzhong Yang wrote:
> >> > Just wondering if anyone has seen the same issue, or it's just me.
> >>
> >> we're using debian with our own backported kernels and ceph, works rock
> >> solid.
> >>
> >> what you're describing sounds more like hardware issues to me. if you
> >> don't fully "trust"/have confidence in your hardware (and your logs
> >> don't reveal anything), I'd recommend running some burn-in tests
> >> (memtest, cpuburn, etc.) on them for 24 hours/machine to rule out
> >> cpu/ram/etc. issues.
> >>
> >> Regards,
> >> Daniel
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> Cheers,
> Brad
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?

2018-01-20 Thread Brad Hubbard
On Fri, Jan 19, 2018 at 11:54 PM, Youzhong Yang  wrote:
> I don't think it's hardware issue. All the hosts are VMs. By the way, using
> the same set of VMWare hypervisors, I switched back to Ubuntu 16.04 last
> night, so far so good, no freeze.

Too little information to make any sort of assessment I'm afraid but,
at this stage, this doesn't sound like a ceph issue.

>
> On Fri, Jan 19, 2018 at 8:50 AM, Daniel Baumann 
> wrote:
>>
>> Hi,
>>
>> On 01/19/18 14:46, Youzhong Yang wrote:
>> > Just wondering if anyone has seen the same issue, or it's just me.
>>
>> we're using debian with our own backported kernels and ceph, works rock
>> solid.
>>
>> what you're describing sounds more like hardware issues to me. if you
>> don't fully "trust"/have confidence in your hardware (and your logs
>> don't reveal anything), I'd recommend running some burn-in tests
>> (memtest, cpuburn, etc.) on them for 24 hours/machine to rule out
>> cpu/ram/etc. issues.
>>
>> Regards,
>> Daniel
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?

2018-01-19 Thread Youzhong Yang
I don't think it's hardware issue. All the hosts are VMs. By the way, using
the same set of VMWare hypervisors, I switched back to Ubuntu 16.04 last
night, so far so good, no freeze.

On Fri, Jan 19, 2018 at 8:50 AM, Daniel Baumann 
wrote:

> Hi,
>
> On 01/19/18 14:46, Youzhong Yang wrote:
> > Just wondering if anyone has seen the same issue, or it's just me.
>
> we're using debian with our own backported kernels and ceph, works rock
> solid.
>
> what you're describing sounds more like hardware issues to me. if you
> don't fully "trust"/have confidence in your hardware (and your logs
> don't reveal anything), I'd recommend running some burn-in tests
> (memtest, cpuburn, etc.) on them for 24 hours/machine to rule out
> cpu/ram/etc. issues.
>
> Regards,
> Daniel
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?

2018-01-19 Thread Daniel Baumann
Hi,

On 01/19/18 14:46, Youzhong Yang wrote:
> Just wondering if anyone has seen the same issue, or it's just me.

we're using debian with our own backported kernels and ceph, works rock
solid.

what you're describing sounds more like hardware issues to me. if you
don't fully "trust"/have confidence in your hardware (and your logs
don't reveal anything), I'd recommend running some burn-in tests
(memtest, cpuburn, etc.) on them for 24 hours/machine to rule out
cpu/ram/etc. issues.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?

2018-01-19 Thread David Turner
The freeze is likely a kernel panic. Try testing different versions of the
kernel.

On Fri, Jan 19, 2018, 8:46 AM Youzhong Yang  wrote:

> One month ago when I first started evaluating ceph, I chose Debian 9.3 as
> the operating system. I saw random OS hang so I gave up and switched to
> Ubuntu 16.04. Every thing works well using Ubuntu 16.04.
>
> Yesterday I tried Ubuntu 17.10, again I saw random OS hang, no matter it's
> mon, mgr, osd, or rgw. When it hangs, the console won't respond to keyboard
> input, the host is unreachable from the network.
>
> This is the OS vs kernel version list:
> Ubuntu 16.04 -> kernel 4.4
> Debian 9.3  -> kernel 4.9
> Ubuntu 17.10 -> kernel 4.13
>
> Just wondering if anyone has seen the same issue, or it's just me.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?

2018-01-19 Thread Youzhong Yang
One month ago when I first started evaluating ceph, I chose Debian 9.3 as
the operating system. I saw random OS hang so I gave up and switched to
Ubuntu 16.04. Every thing works well using Ubuntu 16.04.

Yesterday I tried Ubuntu 17.10, again I saw random OS hang, no matter it's
mon, mgr, osd, or rgw. When it hangs, the console won't respond to keyboard
input, the host is unreachable from the network.

This is the OS vs kernel version list:
Ubuntu 16.04 -> kernel 4.4
Debian 9.3  -> kernel 4.9
Ubuntu 17.10 -> kernel 4.13

Just wondering if anyone has seen the same issue, or it's just me.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com