** Changed in: ceph (Ubuntu)
Assignee: James Page (james-page) => (unassigned)
** Changed in: ceph (Ubuntu)
Status: Incomplete => Opinion
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1874
That sounds promising.
I replaced my node a while ago so I can't verify this one way or the
other, but it certainly sounds like it may be the problem. Including
why Page could not duplicate it in his new install.
One of the reasons I bothered confirming the bug report was so that
future searches
I have the same issue.
That's why I've been testing a few things over the last few days:
Upgrade process:
Luminous -> Mimic -> Nautilus -> Octopus
(All Versions run under Bionic)
It doesn't matter whether I activate msgr2 or not. I always get the problem
after upgrading to Octopus:
2021-01-11T0
This issue appears to be documented here:
https://docs.ceph.com/en/latest/releases/nautilus/#instructions
Complete the upgrade by disallowing pre-Nautilus OSDs and enabling all
new Nautilus-only functionality:
# ceph osd require-osd-release nautilus
Important This step is mandatory. Failure to ex
I am in the middle of an mimic -> nautilus -> octopus upgrade, and got
the same 'tick checking mon for new map' cycle from my 15.2.3 OSD
daemons. After
$ ceph osd require-osd-release mimic
octopus OSD-s can connect to the cluster.
--
You received this bug notification because you are a member o
tail -f /var/log/ceph/ceph-osd.13.log
2020-05-22T17:27:43.909-0500 7f44708ca700 1 osd.13 46107 tick checking mon for
new map
2020-05-22T17:28:14.825-0500 7f44708ca700 1 osd.13 46107 tick checking mon for
new map
2020-05-22T17:28:44.838-0500 7f44708ca700 1 osd.13 46107 tick checking mon for
ne
/etc/ceph/ceph.conf
mon host = 192.168.120.1 192.168.120.2 192.168.120.3
ceph mon dump:
epoch 7
fsid
last_changed 2020-05-16T23:16:32.234657-0500
created 2016-04-08T10:30:10.123758-0500
min_mon_release 15 (octopus)
0: [v2:192.168.120.1:3300/0,v1:192.168.120.1:6789/0] mon.temple-h1
1: [v2:192.168
For example, the test deployment I have uses:
mon_host = 10.5.0.8,10.5.0.5,10.5.0.19
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1874939
Title:
ceph-osd can't connect after upgrade to focal
To m
To confirm:
tcp0 0 10.5.0.8:3300 0.0.0.0:* LISTEN
64045 27128 784/ceph-mon
tcp0 0 10.5.0.8:6789 0.0.0.0:* LISTEN
64045 27129 784/ceph-mon
3300 == v2
6789 == v1
--
You received this
https://docs.ceph.com/docs/master/rados/configuration/msgr2
/#transitioning-from-v1-only-to-v2-plus-v1
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1874939
Title:
ceph-osd can't connect after upgra
Something was tickling my brain about upgrades that we dealt with in the
ceph charms a while back.
The MON's can run v1 and v2 messenger ports however if a port is
specified in mon hosts in ceph.conf its possible that the v2 port is
disable, which is why the OSD can't connect back to the cluster.
On Fri, May 22, 2020 at 11:25 AM Jay Ring <1874...@bugs.launchpad.net>
wrote:
> "However it should be possible to complete the do-release-upgrade to the
> point of requesting a reboot - don't - drop to the CLI and get all
> machines to this point and then:
>
> restart the mons across all three m
"As a side note - even if there is a bug here (and it sounds like there
might be) I would recommend placing the mon and mgr daemons in LXD
containers ontop of the machines hosting the osd's"
Yes. I would strongly suggest doing this also. That is how Ceph now
recommends it anyway. However, older
"However it should be possible to complete the do-release-upgrade to the
point of requesting a reboot - don't - drop to the CLI and get all
machines to this point and then:
restart the mons across all three machines
restart the mgrs across all three machines
restart the osds across all three
Hi Christian
On Fri, May 22, 2020 at 8:10 AM Christian Huebner <
1874...@bugs.launchpad.net> wrote:
> i filed this bug specifically for hyperconverged environments. Upgrading
> monitor nodes first and then upgrading separate OSD nodes is probably
> doable, but in a hyperconverged environment you
i filed this bug specifically for hyperconverged environments. Upgrading
monitor nodes first and then upgrading separate OSD nodes is probably
doable, but in a hyperconverged environment you can not separate.
I tried do-release-upgrade (a couple of times) without rebooting at the end,
but found t
Other ideas - please could impacted users validate networking esp MTU
configuration between machines in their cluster before, during and post
upgrade.
Ceph can be very sensitive to MTU mismatches and just hang when stuff is
not quite right.
--
You received this bug notification because you are a
Marking 'Incomplete' for now as unable to reproduce.
** Changed in: ceph (Ubuntu)
Status: In Progress => Incomplete
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1874939
Title:
ceph-osd can'
Testing phase 2 - three machine all-in-one deploy.
Deployed using eoan - mon,mgr and 1 x osd on each machine
Deployment seeded with pools a lightweight test data - rbd's in each
pool.
Each machine upgraded in turn (1,2 and then 0) using do-release-upgrade.
ceph versions checked throughout deplo
As a side note - even if there is a bug here (and it sounds like there
might be) I would recommend placing the mon and mgr daemons in LXD
containers ontop of the machines hosting the osd's - this will allow you
to manage them independently from an upgrade process for both ceph
upgrades and ubuntu r
OK further fact discovery from my testing.
I have a 6 machine cluster deployed - three machines host mon,mgr and
three machines host osd.
Upgrading the mon,mgr cluster first followed by the three osd machine
using do-release-upgrade and allowing the tool to reboot the machine at
the end resulted
You may need more than one node to reproduce the problem.
I had a 3 node system.
I ran do-release-upgrade on node 1.
The OSDs on node 1 connected to the monitor quorum, which had un-
upgraded monitors on hosts 2 & 3.
The upgraded OSDs on node 1 immediately died and could not be revived.
--
Yo
ceph-mon eoan->focal upgrade testing
ceph-mon@`hostname` systemd units not restarted until reboot step of the
upgrade process on each node; mixed version cluster operated as expected
as each mon was upgraded.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is
working on reproduction for debug and triage.
** Changed in: ceph (Ubuntu)
Status: Confirmed => In Progress
** Changed in: ceph (Ubuntu)
Assignee: (unassigned) => James Page (james-page)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscrib
Just writing in to confirm this bug.
It's very serious.
Lost a whole node. No real warning. Extremely frustrating.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1874939
Title:
ceph-osd can't con
Status changed to 'Confirmed' because the bug affects multiple users.
** Changed in: ceph (Ubuntu)
Status: New => Confirmed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1874939
Title:
ceph-
I accomplished the upgrade by marking all Ceph packages held, then
digging myself through the dependency jungle to upgrade the packages
subsequently. This obviously is not a production ready way to do so, but
at least Ceph Octopus is running in 20.04 now now.
This really needs to be fixed.
--
Yo
One note on importance: If someone runs do-release-upgrade on a
converged Ceph node, it will destroy the node. So far I have not seen
any recovery procedure. The only reason I was able to rapidly redo the
upgrade is because it runs on snapshots and thus can be recovered after
destruction. This is n
I tried to do the upgrade by hand (disable all the services that can not
be autostarted, do the upgrade (btw, a manpage has been moved from ceph-
deploy to ceph-base and thus the apt upgrade fails. do-release-upgrade
is using --force-overwrite for this, but that's not a clean solution).
Solution is
I just shut down Ceph on all four nodes completely, then did the do-
release-upgrade. Before the upgrade I verified that all Ceph services
were down so I would be able to start them in the correct order.
After the upgrade (without reboot!) I found that all Ceph services on
all Ceph nodes had been
I redid the whole upgrade:
* do-release-upgrade and finished without reboot (all 4 nodes)
** so ceph daemons should not have been restarted
* restarted all ceph mons sequentially
** verified I get octopus as min mon release
* restarted all ceph-mgrs sequentially
** verified that all ceph-mgr daemon
The same guidelines apply to hyper-converged architectures.
Package updates are not applied until their corresponding service
restarts. Ceph packaging does not automatically restart any services.
This is by design so you can safely install on a hyper-converged host,
and then control the order in w
This would work If all nodes have a single function only (mon, mgr, old). I
tried everything to update the monitors first, but due to the dependencies
between the Ceph packages the monitors and mgr daemons can not simply be
updated separately from the OSDs What I don't get, though, is that once all
Eoan packages Nautilus, while Focal packages Octopus:
ceph | 14.2.2-0ubuntu3 | eoan
ceph | 14.2.4-0ubuntu0.19.10.2 | eoan-security
ceph | 14.2.8-0ubuntu0.19.10.1 | eoan-updates
ceph | 15.2.1-0ubuntu1 | focal
ceph | 15.2.1-0ubuntu2 | fo
34 matches
Mail list logo