unsubscribe пт, 19 черв. 2020 о 13:00 <pve-user-requ...@pve.proxmox.com> пише:
> Send pve-user mailing list submissions to > pve-user@pve.proxmox.com > > To subscribe or unsubscribe via the World Wide Web, visit > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > or, via email, send a message with subject or body 'help' to > pve-user-requ...@pve.proxmox.com > > You can reach the person managing the list at > pve-user-ow...@pve.proxmox.com > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of pve-user digest..." > > > Today's Topics: > > 1. Enabling telemetry broke all my ceph managers (Lindsay Mathieson) > 2. Re: Enabling telemetry broke all my ceph managers (Brian :) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 18 Jun 2020 21:30:38 +1000 > From: Lindsay Mathieson <lindsay.mathie...@gmail.com> > To: PVE User List <pve-user@pve.proxmox.com> > Subject: [PVE-User] Enabling telemetry broke all my ceph managers > Message-ID: <a6481a31-5d59-c13c-dea2-5367842c2...@gmail.com> > Content-Type: text/plain; charset=utf-8; format=flowed > > Clean nautilous install I setup last week > > * 5 Proxmox nodes > o All on latest updates via no-subscription channel > * 18 OSD's > * 3 Managers > * 3 Monitors > * Cluster Heal good > * In a protracted rebalance phase > * All managed via proxmox > > I thought I would enable telemetry for caph as per this article: > > https://docs.ceph.com/docs/master/mgr/telemetry/ > > > * Enabled the module (command line) > * ceph telemetry on > * Tested getting the status > * Set the contact and description > ceph config set mgr mgr/telemetry/contact 'John Doe > <john....@example.com>' > ceph config set mgr mgr/telemetry/description 'My first Ceph cluster' > ceph config set mgr mgr/telemetry/channel_ident true > * Tried sending it > ceph telemetry send > > I *think* this is when the managers died, but it could have been > earlier. But around then the all ceph IO stopped and I discovered all > three managers had crashed and would not restart. I was shitting myself > because this was remote and the router is a pfSense VM :) Fortunately it > kept going without its disk responding. > > systemctl start ceph-mgr@vni.service > Job for ceph-mgr@vni.service failed because the control process exited > with error code. > See "systemctl status ceph-mgr@vni.service" and "journalctl -xe" for > details. > > From journalcontrol -xe > > -- The unit ceph-mgr@vni.service has entered the 'failed' state with > result 'exit-code'. > Jun 18 21:02:25 vni systemd[1]: Failed to start Ceph cluster manager > daemon. > -- Subject: A start job for unit ceph-mgr@vni.service has failed > -- Defined-By: systemd > -- Support: https://www.debian.org/support > -- > -- A start job for unit ceph-mgr@vni.service has finished with a > failure. > -- > -- The job identifier is 91690 and the job result is failed. > > > From systemctl status ceph-mgr@vni.service > > ceph-mgr@vni.service - Ceph cluster manager daemon > ?? Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; enabled; > vendor preset: enabled) > ? Drop-In: /lib/systemd/system/ceph-mgr@.service.d > ?????????? ??ceph-after-pve-cluster.conf > ?? Active: failed (Result: exit-code) since Thu 2020-06-18 20:53:52 > AEST; 8min ago > ? Process: 415566 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} > --id vni --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE) > ?Main PID: 415566 (code=exited, status=1/FAILURE) > > Jun 18 20:53:52 vni systemd[1]: ceph-mgr@vni.service: Service > RestartSec=10s expired, scheduling restart. > Jun 18 20:53:52 vni systemd[1]: ceph-mgr@vni.service: Scheduled restart > job, restart counter is at 4. > Jun 18 20:53:52 vni systemd[1]: Stopped Ceph cluster manager daemon. > Jun 18 20:53:52 vni systemd[1]: ceph-mgr@vni.service: Start request > repeated too quickly. > Jun 18 20:53:52 vni systemd[1]: ceph-mgr@vni.service: Failed with result > 'exit-code'. > Jun 18 20:53:52 vni systemd[1]: Failed to start Ceph cluster manager > daemon. > > I created a new manager service on an unused node and fortunately that > worked. I deleted/recreated the old managers and they started working. > It was a sweaty few minutes :) > > > Everything resumed without a hiccup after that, impressed. Not game to > try and reproduce it though. > > > > -- > Lindsay > > > > ------------------------------ > > Message: 2 > Date: Thu, 18 Jun 2020 23:06:40 +0100 > From: "Brian :" <bri...@iptel.co> > To: PVE User List <pve-user@pve.proxmox.com> > Subject: Re: [PVE-User] Enabling telemetry broke all my ceph managers > Message-ID: > <CAGPQfi_xwebe= > meekodholn1s30bkx9cddiedjvlfvvqzh7...@mail.gmail.com> > Content-Type: text/plain; charset="UTF-8" > > Nice save. And thanks for the detailed info. > > On Thursday, June 18, 2020, Lindsay Mathieson <lindsay.mathie...@gmail.com > > > wrote: > > Clean nautilous install I setup last week > > > > * 5 Proxmox nodes > > o All on latest updates via no-subscription channel > > * 18 OSD's > > * 3 Managers > > * 3 Monitors > > * Cluster Heal good > > * In a protracted rebalance phase > > * All managed via proxmox > > > > I thought I would enable telemetry for caph as per this article: > > > > https://docs.ceph.com/docs/master/mgr/telemetry/ > > > > > > * Enabled the module (command line) > > * ceph telemetry on > > * Tested getting the status > > * Set the contact and description > > ceph config set mgr mgr/telemetry/contact 'John Doe > > <john....@example.com>' > > ceph config set mgr mgr/telemetry/description 'My first Ceph cluster' > > ceph config set mgr mgr/telemetry/channel_ident true > > * Tried sending it > > ceph telemetry send > > > > I *think* this is when the managers died, but it could have been earlier. > But around then the all ceph IO stopped and I discovered all three managers > had crashed and would not restart. I was shitting myself because this was > remote and the router is a pfSense VM :) Fortunately it kept going without > its disk responding. > > > > systemctl start ceph-mgr@vni.service > > Job for ceph-mgr@vni.service failed because the control process exited > with error code. > > See "systemctl status ceph-mgr@vni.service" and "journalctl -xe" for > details. > > > > From journalcontrol -xe > > > > -- The unit ceph-mgr@vni.service has entered the 'failed' state with > > result 'exit-code'. > > Jun 18 21:02:25 vni systemd[1]: Failed to start Ceph cluster manager > > daemon. > > -- Subject: A start job for unit ceph-mgr@vni.service has failed > > -- Defined-By: systemd > > -- Support: https://www.debian.org/support > > -- > > -- A start job for unit ceph-mgr@vni.service has finished with a > > failure. > > -- > > -- The job identifier is 91690 and the job result is failed. > > > > > > From systemctl status ceph-mgr@vni.service > > > > ceph-mgr@vni.service - Ceph cluster manager daemon > > Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; enabled; > vendor > preset: enabled) > > Drop-In: /lib/systemd/system/ceph-mgr@.service.d > > ??ceph-after-pve-cluster.conf > > Active: failed (Result: exit-code) since Thu 2020-06-18 20:53:52 AEST; > 8min ago > > Process: 415566 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} > --id vni --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE) > > Main PID: 415566 (code=exited, status=1/FAILURE) > > > > Jun 18 20:53:52 vni systemd[1]: ceph-mgr@vni.service: Service > RestartSec=10s expired, scheduling restart. > > Jun 18 20:53:52 vni systemd[1]: ceph-mgr@vni.service: Scheduled restart > job, restart counter is at 4. > > Jun 18 20:53:52 vni systemd[1]: Stopped Ceph cluster manager daemon. > > Jun 18 20:53:52 vni systemd[1]: ceph-mgr@vni.service: Start request > repeated too quickly. > > Jun 18 20:53:52 vni systemd[1]: ceph-mgr@vni.service: Failed with result > 'exit-code'. > > Jun 18 20:53:52 vni systemd[1]: Failed to start Ceph cluster manager > daemon. > > > > I created a new manager service on an unused node and fortunately that > worked. I deleted/recreated the old managers and they started working. It > was a sweaty few minutes :) > > > > > > Everything resumed without a hiccup after that, impressed. Not game to > try and reproduce it though. > > > > > > > > -- > > Lindsay > > > > _______________________________________________ > > pve-user mailing list > > pve-user@pve.proxmox.com > > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > pve-user mailing list > pve-user@pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > ------------------------------ > > End of pve-user Digest, Vol 147, Issue 10 > ***************************************** > -- С уважением, Токовенко Алексей Алексеевич _______________________________________________ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user