Re: [ceph-users] mds standby + standby-reply upgrade
Patrick Donnelly пишет: >> Infernalis: e5165: 1/1/1 up {0=c=up:active}, 1 up:standby-replay, 1 >> up:standby >> >> Now after upgrade start and next mon restart, active monitor falls with >> "assert(info.state == MDSMap::STATE_STANDBY)" (even without running mds) . > > This is the first time you've upgraded your pool to jewel right? > Straight from 9.X to 10.2.2? > Yes -- WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.by/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds standby + standby-reply upgrade
Hi Dzianis, On Thu, Jun 30, 2016 at 4:03 PM, Dzianis Kahanovichwrote: > Upgraded infernalis->jewel (git, Gentoo). Upgrade passed over global > stop/restart everything oneshot. > > Infernalis: e5165: 1/1/1 up {0=c=up:active}, 1 up:standby-replay, 1 up:standby > > Now after upgrade start and next mon restart, active monitor falls with > "assert(info.state == MDSMap::STATE_STANDBY)" (even without running mds) . This is the first time you've upgraded your pool to jewel right? Straight from 9.X to 10.2.2? -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds standby + standby-reply upgrade
On Mon, Jul 4, 2016 at 12:38 PM, Dzianis Kahanovichwrote: > Gregory Farnum пишет: >> On Thu, Jun 30, 2016 at 1:03 PM, Dzianis Kahanovich wrote: >>> Upgraded infernalis->jewel (git, Gentoo). Upgrade passed over global >>> stop/restart everything oneshot. >>> >>> Infernalis: e5165: 1/1/1 up {0=c=up:active}, 1 up:standby-replay, 1 >>> up:standby >>> >>> Now after upgrade start and next mon restart, active monitor falls with >>> "assert(info.state == MDSMap::STATE_STANDBY)" (even without running mds) . >>> Fixed: >>> >>> --- a/src/mon/MDSMonitor.cc 2016-06-27 21:26:26.0 +0300 >>> +++ b/src/mon/MDSMonitor.cc 2016-06-28 10:44:32.0 +0300 >>> @@ -2793,7 +2793,11 @@ bool MDSMonitor::maybe_promote_standby(s >>> for (const auto : pending_fsmap.standby_daemons) { >>>const auto = j.first; >>>const auto = j.second; >>> - assert(info.state == MDSMap::STATE_STANDBY); >>> +// assert(info.state == MDSMap::STATE_STANDBY); >>> + if (info.state != MDSMap::STATE_STANDBY) { >>> +dout(0) << "gid " << gid << " ex-assert(info.state == >>> MDSMap::STATE_STANDBY) " << do_propose << dendl; >>> + return do_propose; >>> + } >>> >>>if (!info.standby_replay) { >>> continue; >>> >>> >>> Now: e5442: 1/1/1 up {0=a=up:active}, 1 up:standby >>> - but really there are 3 mds (active, replay, standby). >>> >>> # ceph mds dump >>> dumped fsmap epoch 5442 >>> fs_name cephfs >>> epoch 5441 >>> flags 0 >>> created 2016-04-10 23:44:38.858769 >>> modified2016-06-27 23:08:26.211880 >>> tableserver 0 >>> root0 >>> session_timeout 60 >>> session_autoclose 300 >>> max_file_size 1099511627776 >>> last_failure5239 >>> last_failure_osd_epoch 18473 >>> compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable >>> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds >>> uses >>> versioned encoding,6=dirfrag is stored in omap,8=no anchor table} >>> max_mds 1 >>> in 0 >>> up {0=3104110} >>> failed >>> damaged >>> stopped >>> data_pools 5 >>> metadata_pool 6 >>> inline_data disabled >>> 3104110:10.227.227.103:6800/14627 'a' mds.0.5436 up:active seq 30 >>> 3084126:10.227.227.104:6800/24069 'c' mds.0.0 up:standby-replay seq >>> 1 >>> >>> >>> If standby-replay false - all OK: 1/1/1 up {0=a=up:active}, 2 up:standby >>> >>> How to fix this 3-mds behaviour? >> >> Ah, you hit a known bug with that assert. I thought the fix was >> already in the latest point release; are you behind? >> -Greg >> > > Cheked in logs - observed in version 10.2.2-45-g9aafefe > (9aafefeab6b0f01d7467f70cb2f1b16ae88340e8) - 27.06 git jewel branch latest. > Where is fixed point? Ah, I see another report of this as well. Created a ticket: http://tracker.ceph.com/issues/16592. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds standby + standby-reply upgrade
Gregory Farnum пишет: > On Thu, Jun 30, 2016 at 1:03 PM, Dzianis Kahanovichwrote: >> Upgraded infernalis->jewel (git, Gentoo). Upgrade passed over global >> stop/restart everything oneshot. >> >> Infernalis: e5165: 1/1/1 up {0=c=up:active}, 1 up:standby-replay, 1 >> up:standby >> >> Now after upgrade start and next mon restart, active monitor falls with >> "assert(info.state == MDSMap::STATE_STANDBY)" (even without running mds) . >> Fixed: >> >> --- a/src/mon/MDSMonitor.cc 2016-06-27 21:26:26.0 +0300 >> +++ b/src/mon/MDSMonitor.cc 2016-06-28 10:44:32.0 +0300 >> @@ -2793,7 +2793,11 @@ bool MDSMonitor::maybe_promote_standby(s >> for (const auto : pending_fsmap.standby_daemons) { >>const auto = j.first; >>const auto = j.second; >> - assert(info.state == MDSMap::STATE_STANDBY); >> +// assert(info.state == MDSMap::STATE_STANDBY); >> + if (info.state != MDSMap::STATE_STANDBY) { >> +dout(0) << "gid " << gid << " ex-assert(info.state == >> MDSMap::STATE_STANDBY) " << do_propose << dendl; >> + return do_propose; >> + } >> >>if (!info.standby_replay) { >> continue; >> >> >> Now: e5442: 1/1/1 up {0=a=up:active}, 1 up:standby >> - but really there are 3 mds (active, replay, standby). >> >> # ceph mds dump >> dumped fsmap epoch 5442 >> fs_name cephfs >> epoch 5441 >> flags 0 >> created 2016-04-10 23:44:38.858769 >> modified2016-06-27 23:08:26.211880 >> tableserver 0 >> root0 >> session_timeout 60 >> session_autoclose 300 >> max_file_size 1099511627776 >> last_failure5239 >> last_failure_osd_epoch 18473 >> compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable >> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds >> uses >> versioned encoding,6=dirfrag is stored in omap,8=no anchor table} >> max_mds 1 >> in 0 >> up {0=3104110} >> failed >> damaged >> stopped >> data_pools 5 >> metadata_pool 6 >> inline_data disabled >> 3104110:10.227.227.103:6800/14627 'a' mds.0.5436 up:active seq 30 >> 3084126:10.227.227.104:6800/24069 'c' mds.0.0 up:standby-replay seq 1 >> >> >> If standby-replay false - all OK: 1/1/1 up {0=a=up:active}, 2 up:standby >> >> How to fix this 3-mds behaviour? > > Ah, you hit a known bug with that assert. I thought the fix was > already in the latest point release; are you behind? > -Greg > Cheked in logs - observed in version 10.2.2-45-g9aafefe (9aafefeab6b0f01d7467f70cb2f1b16ae88340e8) - 27.06 git jewel branch latest. Where is fixed point? -- WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.by/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds standby + standby-reply upgrade
On Thu, Jun 30, 2016 at 1:03 PM, Dzianis Kahanovichwrote: > Upgraded infernalis->jewel (git, Gentoo). Upgrade passed over global > stop/restart everything oneshot. > > Infernalis: e5165: 1/1/1 up {0=c=up:active}, 1 up:standby-replay, 1 up:standby > > Now after upgrade start and next mon restart, active monitor falls with > "assert(info.state == MDSMap::STATE_STANDBY)" (even without running mds) . > Fixed: > > --- a/src/mon/MDSMonitor.cc 2016-06-27 21:26:26.0 +0300 > +++ b/src/mon/MDSMonitor.cc 2016-06-28 10:44:32.0 +0300 > @@ -2793,7 +2793,11 @@ bool MDSMonitor::maybe_promote_standby(s > for (const auto : pending_fsmap.standby_daemons) { >const auto = j.first; >const auto = j.second; > - assert(info.state == MDSMap::STATE_STANDBY); > +// assert(info.state == MDSMap::STATE_STANDBY); > + if (info.state != MDSMap::STATE_STANDBY) { > +dout(0) << "gid " << gid << " ex-assert(info.state == > MDSMap::STATE_STANDBY) " << do_propose << dendl; > + return do_propose; > + } > >if (!info.standby_replay) { > continue; > > > Now: e5442: 1/1/1 up {0=a=up:active}, 1 up:standby > - but really there are 3 mds (active, replay, standby). > > # ceph mds dump > dumped fsmap epoch 5442 > fs_name cephfs > epoch 5441 > flags 0 > created 2016-04-10 23:44:38.858769 > modified2016-06-27 23:08:26.211880 > tableserver 0 > root0 > session_timeout 60 > session_autoclose 300 > max_file_size 1099511627776 > last_failure5239 > last_failure_osd_epoch 18473 > compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable > ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds > uses > versioned encoding,6=dirfrag is stored in omap,8=no anchor table} > max_mds 1 > in 0 > up {0=3104110} > failed > damaged > stopped > data_pools 5 > metadata_pool 6 > inline_data disabled > 3104110:10.227.227.103:6800/14627 'a' mds.0.5436 up:active seq 30 > 3084126:10.227.227.104:6800/24069 'c' mds.0.0 up:standby-replay seq 1 > > > If standby-replay false - all OK: 1/1/1 up {0=a=up:active}, 2 up:standby > > How to fix this 3-mds behaviour? Ah, you hit a known bug with that assert. I thought the fix was already in the latest point release; are you behind? -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com