[Touch-packages] [Bug 1439649] Re: Pacemaker unable to communicate with corosync on restart under lxc

2015-06-04 Thread Jill Rouleau
I can reproduce with corosync 2.3.3.  Using corosync 2.3.4 from
ppa:mariosplivalo/corosync on trusty I've not been able to reproduce on
2 tries.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1439649

Title:
  Pacemaker unable to communicate with corosync on restart under lxc

Status in lxc package in Ubuntu:
  Confirmed
Status in pacemaker package in Ubuntu:
  Confirmed

Bug description:
  We've seen this a few times with three node clusters, all running in
  LXC containers; pacemaker fails to restart correctly as it can't
  communicate with corosync, resulting in a down cluster.  Rebooting the
  containers resolves the issue, so suspect some sort of bad state
  either in corosync or pacemaker.

  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
mcp_read_config: Configured corosync to accept connections from group 115: 
Library error (2)
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: main: 
Starting Pacemaker 1.1.10 (Build: 42f2063):  generated-manpages agent-manpages 
ncurses libqb-logging libqb-ipc lha-fencing upstart nagios  heartbeat 
corosync-native snmp libesmtp
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
cluster_connect_quorum: Quorum acquired
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
corosync_node_name: Unable to get node name for nodeid 1000
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
corosync_node_name: Unable to get node name for nodeid 1001
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
corosync_node_name: Unable to get node name for nodeid 1003
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
corosync_node_name: Unable to get node name for nodeid 1001
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
get_node_name: Defaulting to uname -n for the local corosync node name
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
crm_update_peer_state: pcmk_quorum_notification: Node 
juju-machine-4-lxc-4[1001] - state is now member (was (null))
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
corosync_node_name: Unable to get node name for nodeid 1003
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
crm_update_peer_state: pcmk_quorum_notification: Node (null)[1003] - state is 
now member (was (null))
  Apr  2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]:   notice: main: CRM Git 
Version: 42f2063
  Apr  2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]:   notice: 
crm_cluster_connect: Connecting to cluster infrastructure: corosync
  Apr  2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]:   notice: 
corosync_node_name: Unable to get node name for nodeid 1001
  Apr  2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]:   notice: 
get_node_name: Defaulting to uname -n for the local corosync node name
  Apr  2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]:   notice: 
crm_cluster_connect: Connecting to cluster infrastructure: corosync
  Apr  2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]:  [MAIN  ] Denied 
connection attempt from 109:115
  Apr  2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]:  [QB] Invalid IPC 
credentials (1033732-1033746).
  Apr  2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]:error: 
cluster_connect_cpg: Could not connect to the Cluster Process Group API: 11
  Apr  2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]:error: main: HA 
Signon failed
  Apr  2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]:error: main: Aborting 
startup
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:error: 
pcmk_child_exit: Child process attrd (1033746) exited: Network is down (100)
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:  warning: 
pcmk_child_exit: Pacemaker child process attrd no longer wishes to be 
respawned. Shutting ourselves down.
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
pcmk_shutdown_worker: Shuting down Pacemaker
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
stop_child: Stopping crmd: Sent -15 to process 1033748
  Apr  2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]:  warning: do_cib_control: 
Couldn't complete CIB registration 1 times... pause and retry
  Apr  2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]:   notice: crm_shutdown: 
Requesting shutdown, upper limit is 120ms
  Apr  2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]:  warning: do_log: FSA: 
Input I_SHUTDOWN from crm_shutdown() received in state S_STARTING
  Apr  2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]:   notice: 
do_state_transition: State transition S_STARTING -> S_STOPPING [ 
input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
  Apr  2 11:41:32 juju-machine-4-lxc-4 cib[1033743]:   notice: 
crm_cluster_connect: Connecting to cluster infrastructure: corosyn

[Touch-packages] [Bug 1439649] Re: Pacemaker unable to communicate with corosync on restart under lxc

2015-05-27 Thread Jill Rouleau
We've run into this problem after an extended maas/dhcp outage with
expiring leases on metals and units.  All hacluster-deployed lxc's
(openstack-ha services) lost corosync-pacemaker connectivity with
"corosync Invalid IPC credentials", resolving with lxc reboots.  This is
a staging cloud so we could take down maas-dhcp to replicate/test.

** Tags added: canonical-bootstack

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1439649

Title:
  Pacemaker unable to communicate with corosync on restart under lxc

Status in lxc package in Ubuntu:
  Confirmed
Status in pacemaker package in Ubuntu:
  Confirmed

Bug description:
  We've seen this a few times with three node clusters, all running in
  LXC containers; pacemaker fails to restart correctly as it can't
  communicate with corosync, resulting in a down cluster.  Rebooting the
  containers resolves the issue, so suspect some sort of bad state
  either in corosync or pacemaker.

  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
mcp_read_config: Configured corosync to accept connections from group 115: 
Library error (2)
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: main: 
Starting Pacemaker 1.1.10 (Build: 42f2063):  generated-manpages agent-manpages 
ncurses libqb-logging libqb-ipc lha-fencing upstart nagios  heartbeat 
corosync-native snmp libesmtp
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
cluster_connect_quorum: Quorum acquired
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
corosync_node_name: Unable to get node name for nodeid 1000
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
corosync_node_name: Unable to get node name for nodeid 1001
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
corosync_node_name: Unable to get node name for nodeid 1003
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
corosync_node_name: Unable to get node name for nodeid 1001
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
get_node_name: Defaulting to uname -n for the local corosync node name
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
crm_update_peer_state: pcmk_quorum_notification: Node 
juju-machine-4-lxc-4[1001] - state is now member (was (null))
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
corosync_node_name: Unable to get node name for nodeid 1003
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
crm_update_peer_state: pcmk_quorum_notification: Node (null)[1003] - state is 
now member (was (null))
  Apr  2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]:   notice: main: CRM Git 
Version: 42f2063
  Apr  2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]:   notice: 
crm_cluster_connect: Connecting to cluster infrastructure: corosync
  Apr  2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]:   notice: 
corosync_node_name: Unable to get node name for nodeid 1001
  Apr  2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]:   notice: 
get_node_name: Defaulting to uname -n for the local corosync node name
  Apr  2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]:   notice: 
crm_cluster_connect: Connecting to cluster infrastructure: corosync
  Apr  2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]:  [MAIN  ] Denied 
connection attempt from 109:115
  Apr  2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]:  [QB] Invalid IPC 
credentials (1033732-1033746).
  Apr  2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]:error: 
cluster_connect_cpg: Could not connect to the Cluster Process Group API: 11
  Apr  2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]:error: main: HA 
Signon failed
  Apr  2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]:error: main: Aborting 
startup
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:error: 
pcmk_child_exit: Child process attrd (1033746) exited: Network is down (100)
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:  warning: 
pcmk_child_exit: Pacemaker child process attrd no longer wishes to be 
respawned. Shutting ourselves down.
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
pcmk_shutdown_worker: Shuting down Pacemaker
  Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
stop_child: Stopping crmd: Sent -15 to process 1033748
  Apr  2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]:  warning: do_cib_control: 
Couldn't complete CIB registration 1 times... pause and retry
  Apr  2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]:   notice: crm_shutdown: 
Requesting shutdown, upper limit is 120ms
  Apr  2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]:  warning: do_log: FSA: 
Input I_SHUTDOWN from crm_shutdown() received in state S_STARTING
  Apr  2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]:   notice: 
do_state_transition:

[Touch-packages] [Bug 1350947] Re: apparmor: no working rule to allow making a mount private

2014-12-17 Thread Jill Rouleau
** Tags added: canonical-bootstack

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1350947

Title:
  apparmor: no working rule to allow making a mount private

Status in AppArmor Linux application security framework:
  Invalid
Status in linux package in Ubuntu:
  Invalid
Status in lxc package in Ubuntu:
  Triaged

Bug description:
  NOTE: This bug will be fixed with an update to lxc. However, two
  AppArmor bugs (bug #1401619 and bug #1401621) were identified as a
  result of triaging this bug and they will both be fixed in upstream
  AppArmor.

  When the file system is mounted as MS_SHARED by default (such as under
  systemd, or when the admin configures it so), things like schroot or
  LXC need to make their "guest" mounts private. This currently fails
  under utopic:

  $ sudo lxc-create -t busybox -n c1
  $ sudo mount --make-rshared /
  $ sudo strace -fvvs1024 -e mount  lxc-start -n c1
  [...]
  [pid 10749] mount(NULL, "/", NULL, MS_SLAVE, NULL) = -1 EACCES (Permission 
denied)
  lxc-start: Permission denied - Failed to make / rslave

  dmesg says:
  audit: type=1400 audit(1406825005.687:551): apparmor="DENIED" operation="mo
  unt" info="failed flags match" error=-13 profile="/usr/bin/lxc-start" 
name="/" pid=8228 co
  mm="lxc-start" flags="rw, slave"

  (This happens for all mount points on your system, I'm just showing
  the first one)

  This will leave a couple of leaked mounts on your system. This is an
  useful rune to clean them up:

  $ for i in 1 2 3; do sudo umount `mount|grep lxc|awk '{print $3}'`;
  done

  (needs to be done several times; check with "mount |grep lxc" that
  it's clean)

  I tried to allow that by adding this to
  /etc/apparmor.d/abstractions/lxc/start-container:

    mount options=(rw, slave) -> **,

  then reload the policy and rety with

  $ sudo stop lxc; sudo start lxc; sudo lxc-start -n c1

  (and again clean up the mounts with above rune)

  I tried some variations of this, like

    mount options in (rw, slave, rslave, shared, rshared) -> **,

  but none of them worked. The only things that do work are one of

    mount,
    mount -> **,

  but those are too lax to be an effective security restriction.

  WORKAROUND
  ==
  (Attention: insecure! Don't use for production machines)

  Add this to /etc/apparmor.d/abstractions/lxc/start-container:

     mount,

  ProblemType: Bug
  DistroRelease: Ubuntu 14.10
  Package: linux-image-3.16.0-6-generic 3.16.0-6.11
  ProcVersionSignature: Ubuntu 3.16.0-6.11-generic 3.16.0-rc7
  Uname: Linux 3.16.0-6-generic x86_64
  ApportVersion: 2.14.5-0ubuntu1
  Architecture: amd64
  AudioDevicesInUse:
   USERPID ACCESS COMMAND
   /dev/snd/controlC0:  martin 1665 F pulseaudio
  CurrentDesktop: Unity
  Date: Thu Jul 31 18:58:18 2014
  EcryptfsInUse: Yes
  InstallationDate: Installed on 2014-02-27 (154 days ago)
  InstallationMedia: Ubuntu 14.04 LTS "Trusty Tahr" - Alpha amd64 (20140224)
  MachineType: LENOVO 2324CTO
  ProcFB: 0 inteldrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.16.0-6-generic.efi.signed 
root=UUID=a2b27321-0b55-44c9-af0d-6c939efa45ce ro quiet splash 
init=/lib/systemd/systemd crashkernel=384M-:128M vt.handoff=7
  RelatedPackageVersions:
   linux-restricted-modules-3.16.0-6-generic N/A
   linux-backports-modules-3.16.0-6-generic  N/A
   linux-firmware1.132
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 07/09/2013
  dmi.bios.vendor: LENOVO
  dmi.bios.version: G2ET95WW (2.55 )
  dmi.board.asset.tag: Not Available
  dmi.board.name: 2324CTO
  dmi.board.vendor: LENOVO
  dmi.board.version: 0B98401 Pro
  dmi.chassis.asset.tag: No Asset Information
  dmi.chassis.type: 10
  dmi.chassis.vendor: LENOVO
  dmi.chassis.version: Not Available
  dmi.modalias: 
dmi:bvnLENOVO:bvrG2ET95WW(2.55):bd07/09/2013:svnLENOVO:pn2324CTO:pvrThinkPadX230:rvnLENOVO:rn2324CTO:rvr0B98401Pro:cvnLENOVO:ct10:cvrNotAvailable:
  dmi.product.name: 2324CTO
  dmi.product.version: ThinkPad X230
  dmi.sys.vendor: LENOVO

To manage notifications about this bug go to:
https://bugs.launchpad.net/apparmor/+bug/1350947/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp