Re: [ceph-users] Flood of 'failed to encode map X with expected crc' on 1800 OSD cluster after upgrade

2016-07-12 Thread Wido den Hollander

> Op 12 juli 2016 om 8:47 schreef Christian Balzer :
> 
> 
> 
> Hello,
> 
> On Tue, 12 Jul 2016 08:39:16 +0200 (CEST) Wido den Hollander wrote:
> 
> > Hi,
> > 
> > I am upgrading a 1800 OSD cluster from Hammer 0.94.5 to 0.94.7 prior to 
> > going to Jewel and while doing so I see the monitors being flooded with 
> > these messages:
> >
> Google is your friend (and so is the NSA):
> ---
> http://www.spinics.net/lists/ceph-devel/msg30450.html
> ---
> 

Thanks! I was searching, but never found that thread. Well, not that post in 
that thread.

The messages in my 'ceph -w' are still 20 minutes behind currently. Logging 
about 08:39 while it's 08:56 here right now.

Wido

> That's also one of the reasons that despite only having a fraction of your
> or Dan's OSDs I'm not upgrading to 0.94.7...
> 
> Christian
> 
> > 2016-07-12 08:28:12.919748 osd.1200 [WRN] failed to encode map e130549 with 
> > expected crc
> > 2016-07-12 08:28:12.921943 osd.1338 [WRN] failed to encode map e130549 with 
> > expected crc
> > 2016-07-12 08:28:12.923814 osd.353 [WRN] failed to encode map e130549 with 
> > expected crc
> > 2016-07-12 08:28:12.939370 osd.1200 [WRN] failed to encode map e130549 with 
> > expected crc
> > 2016-07-12 08:28:12.941482 osd.1338 [WRN] failed to encode map e130549 with 
> > expected crc
> > 2016-07-12 08:28:12.960100 osd.1338 [WRN] failed to encode map e130549 with 
> > expected crc
> > 2016-07-12 08:28:12.979404 osd.1338 [WRN] failed to encode map e130549 with 
> > expected crc
> > 2016-07-12 08:28:13.012463 osd.353 [WRN] failed to encode map e130549 with 
> > expected crc
> > 2016-07-12 08:28:13.039417 osd.353 [WRN] failed to encode map e130549 with 
> > expected crc
> > 2016-07-12 08:28:13.079893 osd.353 [WRN] failed to encode map e130549 with 
> > expected crc
> > 2016-07-12 08:28:13.76 osd.575 [WRN] failed to encode map e130549 with 
> > expected crc
> > 2016-07-12 08:28:13.135279 osd.353 [WRN] failed to encode map e130549 with 
> > expected crc
> > 2016-07-12 08:28:13.144697 osd.575 [WRN] failed to encode map e130549 with 
> > expected crc
> > 
> > This just goes on and on. The flood of messages cause the monitors to start 
> > consuming a bit of CPU which makes the cluster operate slower.
> > 
> > I am restarting the OSDs slowly and when I stop doing so the messages 
> > disappear and the cluster operates just fine.
> > 
> > I know that the messages pop up due to a version mismatch, but is there any 
> > way to suppress them?
> > 
> > Wido
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> 
> 
> -- 
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Flood of 'failed to encode map X with expected crc' on 1800 OSD cluster after upgrade

2016-07-12 Thread Christian Balzer

Hello,

On Tue, 12 Jul 2016 08:39:16 +0200 (CEST) Wido den Hollander wrote:

> Hi,
> 
> I am upgrading a 1800 OSD cluster from Hammer 0.94.5 to 0.94.7 prior to going 
> to Jewel and while doing so I see the monitors being flooded with these 
> messages:
>
Google is your friend (and so is the NSA):
---
http://www.spinics.net/lists/ceph-devel/msg30450.html
---

That's also one of the reasons that despite only having a fraction of your
or Dan's OSDs I'm not upgrading to 0.94.7...

Christian

> 2016-07-12 08:28:12.919748 osd.1200 [WRN] failed to encode map e130549 with 
> expected crc
> 2016-07-12 08:28:12.921943 osd.1338 [WRN] failed to encode map e130549 with 
> expected crc
> 2016-07-12 08:28:12.923814 osd.353 [WRN] failed to encode map e130549 with 
> expected crc
> 2016-07-12 08:28:12.939370 osd.1200 [WRN] failed to encode map e130549 with 
> expected crc
> 2016-07-12 08:28:12.941482 osd.1338 [WRN] failed to encode map e130549 with 
> expected crc
> 2016-07-12 08:28:12.960100 osd.1338 [WRN] failed to encode map e130549 with 
> expected crc
> 2016-07-12 08:28:12.979404 osd.1338 [WRN] failed to encode map e130549 with 
> expected crc
> 2016-07-12 08:28:13.012463 osd.353 [WRN] failed to encode map e130549 with 
> expected crc
> 2016-07-12 08:28:13.039417 osd.353 [WRN] failed to encode map e130549 with 
> expected crc
> 2016-07-12 08:28:13.079893 osd.353 [WRN] failed to encode map e130549 with 
> expected crc
> 2016-07-12 08:28:13.76 osd.575 [WRN] failed to encode map e130549 with 
> expected crc
> 2016-07-12 08:28:13.135279 osd.353 [WRN] failed to encode map e130549 with 
> expected crc
> 2016-07-12 08:28:13.144697 osd.575 [WRN] failed to encode map e130549 with 
> expected crc
> 
> This just goes on and on. The flood of messages cause the monitors to start 
> consuming a bit of CPU which makes the cluster operate slower.
> 
> I am restarting the OSDs slowly and when I stop doing so the messages 
> disappear and the cluster operates just fine.
> 
> I know that the messages pop up due to a version mismatch, but is there any 
> way to suppress them?
> 
> Wido
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Flood of 'failed to encode map X with expected crc' on 1800 OSD cluster after upgrade

2016-07-12 Thread Wido den Hollander
Hi,

I am upgrading a 1800 OSD cluster from Hammer 0.94.5 to 0.94.7 prior to going 
to Jewel and while doing so I see the monitors being flooded with these 
messages:

2016-07-12 08:28:12.919748 osd.1200 [WRN] failed to encode map e130549 with 
expected crc
2016-07-12 08:28:12.921943 osd.1338 [WRN] failed to encode map e130549 with 
expected crc
2016-07-12 08:28:12.923814 osd.353 [WRN] failed to encode map e130549 with 
expected crc
2016-07-12 08:28:12.939370 osd.1200 [WRN] failed to encode map e130549 with 
expected crc
2016-07-12 08:28:12.941482 osd.1338 [WRN] failed to encode map e130549 with 
expected crc
2016-07-12 08:28:12.960100 osd.1338 [WRN] failed to encode map e130549 with 
expected crc
2016-07-12 08:28:12.979404 osd.1338 [WRN] failed to encode map e130549 with 
expected crc
2016-07-12 08:28:13.012463 osd.353 [WRN] failed to encode map e130549 with 
expected crc
2016-07-12 08:28:13.039417 osd.353 [WRN] failed to encode map e130549 with 
expected crc
2016-07-12 08:28:13.079893 osd.353 [WRN] failed to encode map e130549 with 
expected crc
2016-07-12 08:28:13.76 osd.575 [WRN] failed to encode map e130549 with 
expected crc
2016-07-12 08:28:13.135279 osd.353 [WRN] failed to encode map e130549 with 
expected crc
2016-07-12 08:28:13.144697 osd.575 [WRN] failed to encode map e130549 with 
expected crc

This just goes on and on. The flood of messages cause the monitors to start 
consuming a bit of CPU which makes the cluster operate slower.

I am restarting the OSDs slowly and when I stop doing so the messages disappear 
and the cluster operates just fine.

I know that the messages pop up due to a version mismatch, but is there any way 
to suppress them?

Wido
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com