[Touch-packages] [Bug 1439649] Re: Pacemaker unable to communicate with corosync on restart under lxc
I can reproduce with corosync 2.3.3. Using corosync 2.3.4 from ppa:mariosplivalo/corosync on trusty I've not been able to reproduce on 2 tries. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1439649 Title: Pacemaker unable to communicate with corosync on restart under lxc Status in lxc package in Ubuntu: Confirmed Status in pacemaker package in Ubuntu: Confirmed Bug description: We've seen this a few times with three node clusters, all running in LXC containers; pacemaker fails to restart correctly as it can't communicate with corosync, resulting in a down cluster. Rebooting the containers resolves the issue, so suspect some sort of bad state either in corosync or pacemaker. Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: mcp_read_config: Configured corosync to accept connections from group 115: Library error (2) Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: main: Starting Pacemaker 1.1.10 (Build: 42f2063): generated-manpages agent-manpages ncurses libqb-logging libqb-ipc lha-fencing upstart nagios heartbeat corosync-native snmp libesmtp Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: cluster_connect_quorum: Quorum acquired Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1000 Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1001 Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1003 Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1001 Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: get_node_name: Defaulting to uname -n for the local corosync node name Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-4-lxc-4[1001] - state is now member (was (null)) Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1003 Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: crm_update_peer_state: pcmk_quorum_notification: Node (null)[1003] - state is now member (was (null)) Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: notice: main: CRM Git Version: 42f2063 Apr 2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync Apr 2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]: notice: corosync_node_name: Unable to get node name for nodeid 1001 Apr 2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]: notice: get_node_name: Defaulting to uname -n for the local corosync node name Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync Apr 2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]: [MAIN ] Denied connection attempt from 109:115 Apr 2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]: [QB] Invalid IPC credentials (1033732-1033746). Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]:error: cluster_connect_cpg: Could not connect to the Cluster Process Group API: 11 Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]:error: main: HA Signon failed Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]:error: main: Aborting startup Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:error: pcmk_child_exit: Child process attrd (1033746) exited: Network is down (100) Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: warning: pcmk_child_exit: Pacemaker child process attrd no longer wishes to be respawned. Shutting ourselves down. Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: pcmk_shutdown_worker: Shuting down Pacemaker Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: stop_child: Stopping crmd: Sent -15 to process 1033748 Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: warning: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: notice: crm_shutdown: Requesting shutdown, upper limit is 120ms Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: warning: do_log: FSA: Input I_SHUTDOWN from crm_shutdown() received in state S_STARTING Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: notice: do_state_transition: State transition S_STARTING -> S_STOPPING [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ] Apr 2 11:41:32 juju-machine-4-lxc-4 cib[1033743]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosyn
[Touch-packages] [Bug 1439649] Re: Pacemaker unable to communicate with corosync on restart under lxc
We've run into this problem after an extended maas/dhcp outage with expiring leases on metals and units. All hacluster-deployed lxc's (openstack-ha services) lost corosync-pacemaker connectivity with "corosync Invalid IPC credentials", resolving with lxc reboots. This is a staging cloud so we could take down maas-dhcp to replicate/test. ** Tags added: canonical-bootstack -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1439649 Title: Pacemaker unable to communicate with corosync on restart under lxc Status in lxc package in Ubuntu: Confirmed Status in pacemaker package in Ubuntu: Confirmed Bug description: We've seen this a few times with three node clusters, all running in LXC containers; pacemaker fails to restart correctly as it can't communicate with corosync, resulting in a down cluster. Rebooting the containers resolves the issue, so suspect some sort of bad state either in corosync or pacemaker. Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: mcp_read_config: Configured corosync to accept connections from group 115: Library error (2) Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: main: Starting Pacemaker 1.1.10 (Build: 42f2063): generated-manpages agent-manpages ncurses libqb-logging libqb-ipc lha-fencing upstart nagios heartbeat corosync-native snmp libesmtp Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: cluster_connect_quorum: Quorum acquired Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1000 Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1001 Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1003 Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1001 Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: get_node_name: Defaulting to uname -n for the local corosync node name Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-4-lxc-4[1001] - state is now member (was (null)) Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1003 Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: crm_update_peer_state: pcmk_quorum_notification: Node (null)[1003] - state is now member (was (null)) Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: notice: main: CRM Git Version: 42f2063 Apr 2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync Apr 2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]: notice: corosync_node_name: Unable to get node name for nodeid 1001 Apr 2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]: notice: get_node_name: Defaulting to uname -n for the local corosync node name Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync Apr 2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]: [MAIN ] Denied connection attempt from 109:115 Apr 2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]: [QB] Invalid IPC credentials (1033732-1033746). Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]:error: cluster_connect_cpg: Could not connect to the Cluster Process Group API: 11 Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]:error: main: HA Signon failed Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]:error: main: Aborting startup Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:error: pcmk_child_exit: Child process attrd (1033746) exited: Network is down (100) Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: warning: pcmk_child_exit: Pacemaker child process attrd no longer wishes to be respawned. Shutting ourselves down. Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: pcmk_shutdown_worker: Shuting down Pacemaker Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: stop_child: Stopping crmd: Sent -15 to process 1033748 Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: warning: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: notice: crm_shutdown: Requesting shutdown, upper limit is 120ms Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: warning: do_log: FSA: Input I_SHUTDOWN from crm_shutdown() received in state S_STARTING Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: notice: do_state_transition:
[Touch-packages] [Bug 1350947] Re: apparmor: no working rule to allow making a mount private
** Tags added: canonical-bootstack -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1350947 Title: apparmor: no working rule to allow making a mount private Status in AppArmor Linux application security framework: Invalid Status in linux package in Ubuntu: Invalid Status in lxc package in Ubuntu: Triaged Bug description: NOTE: This bug will be fixed with an update to lxc. However, two AppArmor bugs (bug #1401619 and bug #1401621) were identified as a result of triaging this bug and they will both be fixed in upstream AppArmor. When the file system is mounted as MS_SHARED by default (such as under systemd, or when the admin configures it so), things like schroot or LXC need to make their "guest" mounts private. This currently fails under utopic: $ sudo lxc-create -t busybox -n c1 $ sudo mount --make-rshared / $ sudo strace -fvvs1024 -e mount lxc-start -n c1 [...] [pid 10749] mount(NULL, "/", NULL, MS_SLAVE, NULL) = -1 EACCES (Permission denied) lxc-start: Permission denied - Failed to make / rslave dmesg says: audit: type=1400 audit(1406825005.687:551): apparmor="DENIED" operation="mo unt" info="failed flags match" error=-13 profile="/usr/bin/lxc-start" name="/" pid=8228 co mm="lxc-start" flags="rw, slave" (This happens for all mount points on your system, I'm just showing the first one) This will leave a couple of leaked mounts on your system. This is an useful rune to clean them up: $ for i in 1 2 3; do sudo umount `mount|grep lxc|awk '{print $3}'`; done (needs to be done several times; check with "mount |grep lxc" that it's clean) I tried to allow that by adding this to /etc/apparmor.d/abstractions/lxc/start-container: mount options=(rw, slave) -> **, then reload the policy and rety with $ sudo stop lxc; sudo start lxc; sudo lxc-start -n c1 (and again clean up the mounts with above rune) I tried some variations of this, like mount options in (rw, slave, rslave, shared, rshared) -> **, but none of them worked. The only things that do work are one of mount, mount -> **, but those are too lax to be an effective security restriction. WORKAROUND == (Attention: insecure! Don't use for production machines) Add this to /etc/apparmor.d/abstractions/lxc/start-container: mount, ProblemType: Bug DistroRelease: Ubuntu 14.10 Package: linux-image-3.16.0-6-generic 3.16.0-6.11 ProcVersionSignature: Ubuntu 3.16.0-6.11-generic 3.16.0-rc7 Uname: Linux 3.16.0-6-generic x86_64 ApportVersion: 2.14.5-0ubuntu1 Architecture: amd64 AudioDevicesInUse: USERPID ACCESS COMMAND /dev/snd/controlC0: martin 1665 F pulseaudio CurrentDesktop: Unity Date: Thu Jul 31 18:58:18 2014 EcryptfsInUse: Yes InstallationDate: Installed on 2014-02-27 (154 days ago) InstallationMedia: Ubuntu 14.04 LTS "Trusty Tahr" - Alpha amd64 (20140224) MachineType: LENOVO 2324CTO ProcFB: 0 inteldrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.16.0-6-generic.efi.signed root=UUID=a2b27321-0b55-44c9-af0d-6c939efa45ce ro quiet splash init=/lib/systemd/systemd crashkernel=384M-:128M vt.handoff=7 RelatedPackageVersions: linux-restricted-modules-3.16.0-6-generic N/A linux-backports-modules-3.16.0-6-generic N/A linux-firmware1.132 SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 07/09/2013 dmi.bios.vendor: LENOVO dmi.bios.version: G2ET95WW (2.55 ) dmi.board.asset.tag: Not Available dmi.board.name: 2324CTO dmi.board.vendor: LENOVO dmi.board.version: 0B98401 Pro dmi.chassis.asset.tag: No Asset Information dmi.chassis.type: 10 dmi.chassis.vendor: LENOVO dmi.chassis.version: Not Available dmi.modalias: dmi:bvnLENOVO:bvrG2ET95WW(2.55):bd07/09/2013:svnLENOVO:pn2324CTO:pvrThinkPadX230:rvnLENOVO:rn2324CTO:rvr0B98401Pro:cvnLENOVO:ct10:cvrNotAvailable: dmi.product.name: 2324CTO dmi.product.version: ThinkPad X230 dmi.sys.vendor: LENOVO To manage notifications about this bug go to: https://bugs.launchpad.net/apparmor/+bug/1350947/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp