Bug#966544: [Pkg-net-snmp-devel] Bug#966544: snmpd: extend option broken after update

2020-07-30 Thread Christian Balzer


Hello Craig,

These issues, do they warrant utterly breaking things w/o any recourse
short of recompiling things for many, many users that use the extend
feature?
Especially given the fact that SNMP traffic tends to be on private
networks and the feature not being enabled by default in the config.

At the very least a "this will break things, abort now" missive during
upgrade would have been nice.

If upstream can't/won't fix this snmpd has lost it's usefulness for me in
the long run compared to other data collectors.

Regards,

Christian


On Fri, 31 Jul 2020 10:46:29 +1000 Craig Small  wrote:
> Hi James,
>   That would have been intentional, the EXTEND MIB has major security
> issues.
> 
>  - Craig
> 
> 
> On Thu, 30 Jul 2020 at 23:03, James Greig  wrote:
> 
> > Package: snmpd
> > Version: 5.7.3+dfsg-1.7+deb9u2
> > Severity: important
> >
> > Dear Maintainer,
> >
> > *** Reporter, please consider answering these questions, where appropriate
> > ***
> >
> > Updating snmpd from deb9u1 to deb9u2 via apt on any stretch system
> > breaks the ability to use 'extend' in snmpd.
> >
> > After updating on any stretch system and restarting snmpd this error will
> > appear:-
> >
> > Warning: Unknown token: extend
> >
> > It's likely the latest binary build of this package has not included
> > options to
> > enable extend and/or other extras.
> >
> > *** End of the template - remove these template lines ***
> >
> >
> > -- System Information:
> > Debian Release: 9.13
> >   APT prefers oldstable-updates
> >   APT policy: (500, 'oldstable-updates'), (500, 'oldstable')
> > Architecture: amd64 (x86_64)
> >
> > Kernel: Linux 4.9.0-13-amd64 (SMP w/8 CPU cores)
> > Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8),
> > LANGUAGE=en_GB:en (charmap=UTF-8)
> > Shell: /bin/sh linked to /bin/dash
> > Init: systemd (via /run/systemd/system)
> >
> > Versions of packages snmpd depends on:
> > ii  adduser3.115
> > ii  debconf [debconf-2.0]  1.5.61
> > ii  init-system-helpers1.48
> > ii  libc6  2.24-11+deb9u4
> > ii  libsnmp-base   5.7.3+dfsg-1.7+deb9u2
> > ii  libsnmp30  5.7.3+dfsg-1.7+deb9u2
> > ii  lsb-base   9.20161125
> >
> > snmpd recommends no packages.
> >
> > Versions of packages snmpd suggests:
> > pn  snmptrapd  
> >
> > -- debconf information excluded

-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Mobile Inc.



Bug#922234: Still ongoing

2019-09-23 Thread Christian Balzer


Hello,

just wanted to confirm that this is still ongoing and turned a significant
number of my sparse hairs gray in the middle of last night.

This is on a Supermicro SYS-2028TP-DC0FR/X10DRT-PIBF with 
Intel I350 and 82575EB ethernet interfaces and Mellanox ConnectX-3 in case
anybody is taking notes which firmware driver might be the culprit.

I also had to revert to the oldest (4.9.0-4) kernel to get this working and
have to basically consider everything with Stretch backport kernels to be
non-bootable at this point.

would be nice if somebody from the kernel team could pipe up so we can
provide more info if needed.

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Mobile Inc.



Bug#888512: Clamd suddenly eat up all file descriptors, 'Too many open files' error

2018-01-26 Thread Christian Balzer

Hello,

I can very much and very urgently confirm this bug.
It started happening today on several servers here, unsurprisingly first
the ones with the smallest number of default file descriptors configured.

Interestingly even though the clamd version number is the same, Stretch
servers are unaffected, while Jessie ones very much are.

Alas while I'm able to upgrade most servers, 2 I definitely can not at
this time, so that's not a solution/

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications



Bug#649106: syncid 'fix' breaks state sync in init script completely

2017-11-06 Thread Christian Balzer

Hello,

sorry for the bug necromancy, but this clearly has not been fixed in 1.26
or another regression happened later.

The patch was obviously half (manually?) applied and the current
initscript in 1:1.28-3+b1 has the -"-syncid" statement TWICE in the start
invocations, the first incarnation before "--start-daemon" needs to be
removed.

I've been using LVS/ipvsadm for really long time and not once in over 4
releases has a Debian start script worked out of the box for people using
the daemons.

Would be nice to get this fixed once and for all.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications



Bug#864756: Failover of VMs from dead nodes doesn't work with RBD or any other extstorage

2017-06-14 Thread Christian Balzer

Package: ganeti
Version: 2.15.2-7
Severity: Important

Stretch

As the tin says, ganeti 2.15 is unable/unwilling to failover VMs from a
dead compute node if they aren't DRBD.
This used to work with 2.12 and is a regression, but was not fixed until
the current Debian version was released. 

See the following for details and a patch (works fine for me):
https://groups.google.com/forum/#!topic/ganeti/ahbr7vb7zRo


Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications



Bug#850823: Official patch

2017-06-14 Thread Christian Balzer

The official patch is here and includes a bit more:

https://github.com/ganeti/ganeti/commit/27a999616efefcff96b14688208c93c6a76d8813

-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications



Bug#864754: CPU affinity broken with Stretch (or backport python versions on Jessie)

2017-06-13 Thread Christian Balzer

Package: ganeti
Version: 2.15.2-7
Severity: Important

Stretch

The python psutil did the familiar game of python developers and changed
a function call name. 
In particular for cpu affinity from the version supplied in Jessie (2.1.1)
when going to version 4.x (Stretch and Jessie backports).

For details and solution [simply change "set_cpu_affinity" to
"cpu_affinity" (both occurrences) in lib/hypervisor/hv_kvm/__init__.py] see:

https://groups.google.com/forum/#!topic/ganeti/fQu6Wr14k2M

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications



Bug#864755: Bash completion missing in 2.15

2017-06-13 Thread Christian Balzer

Package: ganeti
Version: 2.15.2-7
Severity: Important

Stretch

Unlike with 2.12 there is no longer a /etc/bash_completion.d/ganeti
provided by this package.

Given the nature of the ganeti beast and lack of a (stable,usable) GUI
this is a rather essential feature.

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications



Bug#850823: Patch works and REALLY should be in there.

2017-06-12 Thread Christian Balzer

Hello Apollon (again).

Just to confirm that the patch works, I'm just in the middle of installing
a new ganeti/ceph cluster based on Stretch.

Also the ganeti bash completion is gone, should I open a bug report for
this or is that something that happened upstream?

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications



Bug#862248: [Debian-ha-maintainers] Bug#862248: No straightforward and permanent way to disable DRBD autostart, no drbd systemd unit file

2017-05-11 Thread Christian Balzer
On Thu, 11 May 2017 09:33:59 +0300 Apollon Oikonomopoulos wrote:

> On 09:15 Thu 11 May , Christian Balzer wrote:
> > Firstly I recreated the initial state bu unmasking drbd and enabling 
> > it, > then reloading systemd.
> > 
> > That find then gives us:
> > ---
> > /run/systemd/generator.late/drbd.service
> > /etc/systemd/system/multi-user.target.wants/drbd.service  
> 
> So, now we need ls -l 
> /etc/systemd/system/multi-user.target.wants/drbd.service to see how old 
> the symlink is and where it points to. Can you also zgrep drbd-utils 
> /var/log/dpkg.log*?
> 


Sure thing, that's quite the old chestnut indeed:
---
lrwxrwxrwx 1 root root 32 Aug 11  2015 
/etc/systemd/system/multi-user.target.wants/drbd.service -> 
/lib/systemd/system/drbd.service
---

Note that this link is also present on:
A Wheezy system with "8.9.2~rc1-1~bpo70+1" installed.
On a system that was initially installed with Jessie but had
8.9.2~rc1-2+deb8u1 installed first.
The plot thickens, see below.

No trace of the original install, just the upgrades when going from Wheezy
to Jessie:
---
/var/log/dpkg.log:2017-05-09 13:03:35 upgrade drbd-utils:amd64 
8.9.2~rc1-1~bpo70+1 8.9.2~rc1-2+deb8u1
/var/log/dpkg.log:2017-05-09 13:14:41 upgrade drbd-utils:amd64 
8.9.2~rc1-2+deb8u1 8.9.5-1~bpo8+1
---

As for Ferenc, that was of course _after_ again disabling drbd, but I
wanted to start from the same state as before, so I unmasked and
enabled it first.

Also for the record, on another Jessie system that never had drbd-utils
installed I installed directly "8.9.5-1~bpo8+1". There disable works as
expected and only leaves the /run/systemd/generator.late/drbd.service
around.

So all points to 8.9.2~rc1 as the culprit.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/



Bug#862248: [Debian-ha-maintainers] Bug#862248: No straightforward and permanent way to disable DRBD autostart, no drbd systemd unit file

2017-05-10 Thread Christian Balzer

Hello,

On Wed, 10 May 2017 15:18:15 +0300 Apollon Oikonomopoulos wrote:

> On 20:55 Wed 10 May , Christian Balzer wrote:
> > is there any package you're not involved with? ^o^  
> 
> Nah, we just happen to be running the same things :)
>
Evidently so. ^o^
 
> > On Wed, 10 May 2017 12:37:34 +0300 Apollon Oikonomopoulos wrote:
> >   
> > > Control: severity -1 wishlist
> > >  
> > Sure thing.
> >    
> > > Hi,
> > > 
> > > On 17:53 Wed 10 May , Christian Balzer wrote:  
> > > > Jessie (backports), systemd.
> > > > 
> > > > When running DRBD with pacemaker it is recommended (and with systemd
> > > > required, see link below) to disable DRBD startup at boot time.
> > > > 
> > > > However:
> > > > ---
> > > > # systemctl disable drbd
> > > > drbd.service is not a native service, redirecting to 
> > > > systemd-sysv-install.
> > > > Executing: /lib/systemd/systemd-sysv-install disable drbd
> > > > insserv: warning: current start runlevel(s) (empty) of script `drbd' 
> > > > overrides LSB defaults (2 3 4 5).
> > > > insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script 
> > > > `drbd' overrides LSB defaults (0 1 6).
> > > > ---
> > > > 
> > > > But since systemd-sysv picks up anything in /etc/init.d/ we get after a
> > > > reboot:
> > > > ---
> > > > # systemctl status drbd
> > > >   drbd.service - LSB: Control drbd resources.
> > > >Loaded: loaded (/etc/init.d/drbd; generated; vendor preset: enabled)
> > > >Active: active (exited) since Wed 2017-05-10 10:37:39 JST; 6h ago
> > > >  Docs: man:systemd-sysv-generator(8)
> > > >CGroup: /system.slice/drbd.service
> > > > ---
> > > > 
> > > > Ways forward would be a unit file for systemd that actually allows 
> > > > disable
> > > > to work as expected or some other means to (permanently) neuter the 
> > > > init.d
> > > > file (instead of an "exit 0" at the top which did the trick for now).   
> > > >  
> > > 
> > > Thanks for the report!
> > > 
> > > You can always use `systemctl mask drbd.service', which will neuter the 
> > > initscript completely. I'm downgrading the severity to 'wishlist', 
> > > unless `systemctl mask' causes some ill side-effects, in which case 
> > > please change the severity again.
> > >   
> > That worked w/o any ill effects I can see.
> > 
> > Unfortunately mask is not a particular well known/referenced systemctl
> > feature, but then again that might be my tremendous love and admiration
> > for all things systemd speaking. ^o^  
> 
> mask is well-documented, it's just something we didn't have with 
> sysvinit, so most people ignore its existence and it's not cited often.
> 
> >   
> > > But yes, ideally we should provide a native unit.
> > >   
> > I wonder if this bears referencing to the systemd/systemd-sysv folks, to
> > maybe suggest "mask" in the output when somebody runs disable against a
> > LSB sysv init script.   
> 
> The thing is, systemctl disable *should* do the right thing, even in 
> jessie. It makes me suspect there are some older package left-overs 
> around. Can you please try running:
> 
>  $ systemctl disable drbd.service
>  $ systemctl daemon-reload
>  $ find /lib/systemd /run/systemd /etc/systemd -name drbd.service
> 

Firstly I recreated the initial state bu unmasking drbd and enabling it,
then reloading systemd.

That find then gives us:
---
/run/systemd/generator.late/drbd.service
/etc/systemd/system/multi-user.target.wants/drbd.service
---

These systems are an upgrade from Wheezy, but there are no old packages
left and the relevant bits (drbd and systemd) are actually from backports.


Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/



Bug#862248: [Debian-ha-maintainers] Bug#862248: No straightforward and permanent way to disable DRBD autostart, no drbd systemd unit file

2017-05-10 Thread Christian Balzer

Hello Apollon,

is there any package you're not involved with? ^o^

On Wed, 10 May 2017 12:37:34 +0300 Apollon Oikonomopoulos wrote:

> Control: severity -1 wishlist
>
Sure thing.
 
> Hi,
> 
> On 17:53 Wed 10 May     , Christian Balzer wrote:
> > Jessie (backports), systemd.
> > 
> > When running DRBD with pacemaker it is recommended (and with systemd
> > required, see link below) to disable DRBD startup at boot time.
> > 
> > However:
> > ---
> > # systemctl disable drbd
> > drbd.service is not a native service, redirecting to systemd-sysv-install.
> > Executing: /lib/systemd/systemd-sysv-install disable drbd
> > insserv: warning: current start runlevel(s) (empty) of script `drbd' 
> > overrides LSB defaults (2 3 4 5).
> > insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `drbd' 
> > overrides LSB defaults (0 1 6).
> > ---
> > 
> > But since systemd-sysv picks up anything in /etc/init.d/ we get after a
> > reboot:
> > ---
> > # systemctl status drbd
> >   drbd.service - LSB: Control drbd resources.
> >Loaded: loaded (/etc/init.d/drbd; generated; vendor preset: enabled)
> >Active: active (exited) since Wed 2017-05-10 10:37:39 JST; 6h ago
> >  Docs: man:systemd-sysv-generator(8)
> >CGroup: /system.slice/drbd.service
> > ---
> > 
> > Ways forward would be a unit file for systemd that actually allows disable
> > to work as expected or some other means to (permanently) neuter the init.d
> > file (instead of an "exit 0" at the top which did the trick for now).  
> 
> Thanks for the report!
> 
> You can always use `systemctl mask drbd.service', which will neuter the 
> initscript completely. I'm downgrading the severity to 'wishlist', 
> unless `systemctl mask' causes some ill side-effects, in which case 
> please change the severity again.
> 
That worked w/o any ill effects I can see.

Unfortunately mask is not a particular well known/referenced systemctl
feature, but then again that might be my tremendous love and admiration
for all things systemd speaking. ^o^

> But yes, ideally we should provide a native unit.
> 
I wonder if this bears referencing to the systemd/systemd-sysv folks, to
maybe suggest "mask" in the output when somebody runs disable against a
LSB sysv init script. 

Regards,

Christian

> Regards,
> Apollon
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/



Bug#862248: No straightforward and permanent way to disable DRBD autostart, no drbd systemd unit file

2017-05-10 Thread Christian Balzer

Package: drbd-utils
Version: 8.9.5-1~bpo8+1
Severity: Important

Jessie (backports), systemd.

When running DRBD with pacemaker it is recommended (and with systemd
required, see link below) to disable DRBD startup at boot time.

However:
---
# systemctl disable drbd
drbd.service is not a native service, redirecting to systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable drbd
insserv: warning: current start runlevel(s) (empty) of script `drbd' overrides 
LSB defaults (2 3 4 5).
insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `drbd' 
overrides LSB defaults (0 1 6).
---

But since systemd-sysv picks up anything in /etc/init.d/ we get after a
reboot:
---
# systemctl status drbd
  drbd.service - LSB: Control drbd resources.
   Loaded: loaded (/etc/init.d/drbd; generated; vendor preset: enabled)
   Active: active (exited) since Wed 2017-05-10 10:37:39 JST; 6h ago
 Docs: man:systemd-sysv-generator(8)
   CGroup: /system.slice/drbd.service
---

Ways forward would be a unit file for systemd that actually allows disable
to work as expected or some other means to (permanently) neuter the init.d
file (instead of an "exit 0" at the top which did the trick for now).

See also:
https://www.mail-archive.com/drbd-user%40lists.linbit.com/msg11045.html

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/



Bug#859700: client state / Message count mismatch with imap-hibernate and mixed POP3/IMAP access

2017-04-06 Thread Christian Balzer

Hello Apollon,

On Thu, 6 Apr 2017 12:17:10 +0300 Apollon Oikonomopoulos wrote:

> Control: tags -1 moreinfo
> 
> Hi Christian,
> 
> Thanks for the report.
> 
> On 15:48 Thu 06 Apr     , Christian Balzer wrote:
> > I've been seeing a few of these since starting this cluster (see 
> > previous
> > mail), they all follow the same pattern, a user who accesses their mailbox
> > with both POP3 and IMAP deletes mails with POP3 and the IMAP
> > (imap-hibernate really) is getting confused and upset about this:
> > 
> > ---
> > Apr  6 09:55:49 mbx11 dovecot: imap-login: Login: user=<redac...@gol.com>, 
> > method=PLAIN, rip=xxx.xxx.x.46, lip=xxx.xxx.x.113, mpid=951561, secured, 
> > session=<2jBV+HRM1Pbc9w8u>
> > Apr  6 10:01:06 mbx11 dovecot: pop3-login: Login: user=<redac...@gol.com>, 
> > method=PLAIN, rip=xxx.xxx.x.46, lip=xxx.xxx.x.41, mpid=35447, secured, 
> > session=
> > Apr  6 10:01:07 mbx11 dovecot: pop3(redac...@gol.com): Disconnected: Logged 
> > out top=0/0, retr=0/0, del=1/1, size=20674 session=
> > Apr  6 10:01:07 mbx11 dovecot: imap(redac...@gol.com): Error: imap-master: 
> > Failed to import client state: Message count mismatch after handling 
> > expunges (0 != 1)
> > Apr  6 10:01:07 mbx11 dovecot: imap(redac...@gol.com): Client state 
> > initialization failed in=0 out=0 head=<0> del=<0> exp=<0> trash=<0> 
> > session=<2jBV+HRM1Pbc9w8u>
> > Apr  6 10:01:15 mbx11 dovecot: imap-login: Login: user=<redac...@gol.com>, 
> > method=PLAIN, rip=xxx.xxx.x.46, lip=xxx.xxx.x.113, mpid=993303, secured, 
> > session=<6QC6C3VMF/jc9w8u>
> > Apr  6 10:07:42 mbx11 dovecot: imap-hibernate(redac...@gol.com): Connection 
> > closed in=85 out=1066 head=<0> del=<0> exp=<0> trash=<0> 
> > session=<6QC6C3VMF/jc9w8u>
> > ---
> > 
> > According to the dovecot ML, this is fixed in 2.2.28, so getting this into
> > Debian and backports would be much appreciated.  
> 
> Unfortunately we are in the middle of the freeze for next stable, and 
> updating to 2.2.28 itself is not an option at this time I'm afraid. If 
> we pinpoint the fix however, we can always backport it to 2.2.27 and 
> have it released with Stretch.
> 
Nods, I think I saw that freeze bit when glancing over the new package
tracker. 
Would love to have that fixed, even though it seems to be (thankfully)
mostly transparent to the actual (and few) users that are affected by it.

> > 
> > See also:
> > https://www.dovecot.org/pipermail/dovecot/2017-April/107668.html  
> 
> According to the message by Aki in the dovecot ML, this is fixed in 
> 1fd44e0634.  However, 1fd44e0634 is already part of 2.2.26 (and 2.2.27 
> of course), which complicates things a bit more:
> 
> $ git describe --contains 1fd44e0634ac312d0960f39f9518b71e08248b65
> 2.2.26~318
> 
> So either the fix is incomplete, or you have some old processes lying 
> around. I can't think of anything else :)
> 
I vote for the first option, since this is a fresh install that never saw
anything but 2.2.27 from backports.

Also so the .26, .27 bits in the git excerpt but didn't check it in
detail, though why did Aki suggest .28 when I stated that it's already .27
here?

Will point the folks on the dovecot ML to this bug report, maybe Timo can
make sense of it.

Thanks,

Christian

> Regards,
> Apollon
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/



Bug#859700: client state / Message count mismatch with imap-hibernate and mixed POP3/IMAP access

2017-04-06 Thread Christian Balzer

Package: dovecot-core
Version: 1:2.2.27-2~bpo8+1
Severity: Important


Hello,

I've been seeing a few of these since starting this cluster (see previous
mail), they all follow the same pattern, a user who accesses their mailbox
with both POP3 and IMAP deletes mails with POP3 and the IMAP
(imap-hibernate really) is getting confused and upset about this:

---
Apr  6 09:55:49 mbx11 dovecot: imap-login: Login: user=<redac...@gol.com>, 
method=PLAIN, rip=xxx.xxx.x.46, lip=xxx.xxx.x.113, mpid=951561, secured, 
session=<2jBV+HRM1Pbc9w8u>
Apr  6 10:01:06 mbx11 dovecot: pop3-login: Login: user=<redac...@gol.com>, 
method=PLAIN, rip=xxx.xxx.x.46, lip=xxx.xxx.x.41, mpid=35447, secured, 
session=
Apr  6 10:01:07 mbx11 dovecot: pop3(redac...@gol.com): Disconnected: Logged out 
top=0/0, retr=0/0, del=1/1, size=20674 session=
Apr  6 10:01:07 mbx11 dovecot: imap(redac...@gol.com): Error: imap-master: 
Failed to import client state: Message count mismatch after handling expunges 
(0 != 1)
Apr  6 10:01:07 mbx11 dovecot: imap(redac...@gol.com): Client state 
initialization failed in=0 out=0 head=<0> del=<0> exp=<0> trash=<0> 
session=<2jBV+HRM1Pbc9w8u>
Apr  6 10:01:15 mbx11 dovecot: imap-login: Login: user=<redac...@gol.com>, 
method=PLAIN, rip=xxx.xxx.x.46, lip=xxx.xxx.x.113, mpid=993303, secured, 
session=<6QC6C3VMF/jc9w8u>
Apr  6 10:07:42 mbx11 dovecot: imap-hibernate(redac...@gol.com): Connection 
closed in=85 out=1066 head=<0> del=<0> exp=<0> trash=<0> 
session=<6QC6C3VMF/jc9w8u>
---

According to the dovecot ML, this is fixed in 2.2.28, so getting this into
Debian and backports would be much appreciated.

See also:
https://www.dovecot.org/pipermail/dovecot/2017-April/107668.html


Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/



Bug#820958: Upgrade of 4.1 to 4.2 in Jessie forces the samba package to be installed and the daemons started (nagios plugins install)

2016-04-13 Thread Christian Balzer

Package: samba
Version: 2:4.2.10+dfsg-0+deb8u1
Severity: Normal

Hello,

the just released security fix and thus upgrade from Samba 4.1 to 4.2
in Jessie introduces another potential security problem.

Consider this (fairly common) scenario:
Server isn't running samba at all, but nagios-plugins-standard was
installed to monitor (NRPE) other services.
nagios-plugins-standard pulls in samba-common (to get smbclient).
So far so good, until now this didn't do anything dangerous and people
most likely allowed all the dependencies/recommendations to be installed.

However this latest version of samba requires the actual samba package to
be installed as well if samba-common is present, which of course will
install the daemon binaries and start them, potentially exposing the
server in question to attacks.
 
A quick workaround is of course to un-install samba if one didn't need
the functionality in the first place.

But a re-packaging in the previous style or at least a stern warning when
pulling in samba into a system that only had samba-common before would be
the correct way forward.

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/



Bug#795060: Latest Wheezy backport kernel prefers Infiniband mlx4_en over mlx4_ib, breaks existing installs

2015-08-16 Thread Christian Balzer
On Fri, 14 Aug 2015 13:01:03 +0200 Ben Hutchings wrote:

 On Fri, 2015-08-14 at 13:45 +0900, Christian Balzer wrote:
 [...]
  So I decided to downgrade the Mellanox firmware of mbx09.
  
  After building a current version of mstflint (the one in Wheezy is
  ancient and not particular justified) I got the oldest firmware
  (2.32.5100) on the Supermicro FTP site for that mainboard and flashed
  it.
  
  Lo and behold, no more vanishing acts of the ib0: interface, no more
  need to blacklist/fake install the mlx4_en module.
  
  While moderately happy with the outcome, I still consider this a kernel
  bug. 
  All the described behavior is not only very unexpected and unwelcome,
  the fact that a remote card reboot can make your network stack vanish
  (and not re-appear unless done manually) is just wrong.
 [...]
 
 This sounds rather more like a firmware bug than a kernel bug.  Please
 ask Mellanox technical support about this.  If they can identify a fix
 in the driver then I'll be happy to apply that.
 

Nevermind that a firmware bug in my book is something that affects things
locally, the fact that 3.2 is not affected and the items listed above make
it a kernel/upstream Mellanox bug.
I'd appreciate if you could use whatever means you have to kick this
upstream as well.

I'll send a mail to Mellanox support, which they may well ignore, this
being a Supermicro OEM product, never mind that it's 100% identical.

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/



Bug#795060: Latest Wheezy backport kernel prefers Infiniband mlx4_en over mlx4_ib, breaks existing installs

2015-08-11 Thread Christian Balzer
On Tue, 11 Aug 2015 20:00:41 +0200 Ben Hutchings wrote:

 On Tue, 2015-08-11 at 10:38 +0900, Christian Balzer wrote:
  Hello Ben,
  
  thanks for the quick and detailed reply.
  
  On Mon, 10 Aug 2015 15:53:57 +0200 Ben Hutchings wrote:
  
   Control: severity -1 important
   Control: tag -1 upstream
   
   On Mon, 2015-08-10 at 13:52 +0900, Christian Balzer wrote:
   [...]
I'm also not seeing this on several other machines we use for Ceph
with the current Jessie kernel, but to be fair they use slightly
different (QDR, not FDR) ConnectX-3 HBAs.
   
   If SR-IOV is enabled on the adapter then the ports will always
   operate in Ethernet mode as it's apparently not supported for IB.
   Perhaps SR -IOV enabled on some adapters but not others?
   
  I was wondering about that, but wasn't aware of the Ethernet only bit
  of SR-IOV. 
  Anyway, the previous cluster and one blade of this new one have
  Mellanox firmware 2.30.8000, which doesn't offer the Flexboot Bios
  menu and thus have no SR-IOV configuration option at boot time.
  
  However the other blade (replacement mobo for a DoA one) in the new
  server does have firmware 2.33.5100 and the Flexboot menu and had
  SR-IOV enabled.
  
  Alas disabling it (and taking out the fake-install) did result in the
  same behavior, mlx4_en was auto-loaded before mlx4_ib.
 [...]
  I added that options mlx4_core port_type_array=1 (since there is only
  one port) to /etc/modprobe.d/local.conf, depmod -a, update-initramfs
  -u, but no joy.
  The mlx4_en module gets auto-loaded before the IB one as well with this
  setting.
 [...]
 
 There was a deliberate change in mlx4_core in Linux 3.15 to load
 mlx4_en first if it finds any Ethernet port.  

Interesting. So this _could_ have bitten me earlier with any flavor of
3.16 kernel if there had been an Ethernet port around.
Again, given that a cluster with otherwise identical hardware doesn't do
this leads me to assume that the presence of that Ethernet port stems from
the 2.33.5100 firmware, no matter if SR-IOV is enabled or not.

 But that is separate from
 the decision of what types of port are configured.
 
From where I'm standing it looks like it will use/configure mlx_en no
matter what. And once the mlx4_en is loaded, mlx4_ib is no longer capable
of creating IB ports.

In fact it will even tear down the remote IB port and load mlx4_en if just
one side changes from IB to EN.
To wit, I had both nodes up with running ib0: interfaces (mlx4_en disabled
via fake-install). 
I then commented out the fake-install on both and did a depmod -a.
On node mbx09 (the one with the newer firmware) I then rmmod'ed mlx4_ib and
mlx4_core.
Then I modprobe'd mlx4_core:
---
Aug 12 10:14:56 mbx09  mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 
2014)
Aug 12 10:14:56 mbx09  mlx4_core: Initializing :02:00.0
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: PCIe link speed is 8.0GT/s, 
device supports 8.0GT/s
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: PCIe link width is x8, device 
supports x8
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 124 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 125 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 126 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 127 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 128 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 129 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 130 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 131 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 132 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 133 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 134 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 135 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 136 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 137 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 138 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 139 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 140 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 141 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 142 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 143 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_core :02:00.0: irq 144 for MSI/MSI-X
Aug 12 10:15:01 mbx09  mlx4_ib mlx4_ib_add: mlx4_ib: Mellanox ConnectX 
InfiniBand driver v2.2-1 (Feb 2014)
Aug 12 10:15:01 mbx09  mlx4_en: Mellanox ConnectX HCA Ethernet driver v2.2-1 
(Feb 2014)
Aug 12 10:15:01 mbx09  mlx4_en :02:00.0: registered PHC clock
Aug 12 10:15:01 mbx09  mlx4_en :02:00.0: Activating port:1
Aug 12 10:15:01 mbx09  mlx4_en: :02:00.0: Port 1: Using 192 TX rings
Aug 12 10:15:01 mbx09  mlx4_en: :02:00.0: Port 1: Using 8 RX rings
Aug 12 10:15:01 mbx09  mlx4_en: :02:00.0

Bug#795060: Latest Wheezy backport kernel prefers Infiniband mlx4_en over mlx4_ib, breaks existing installs

2015-08-10 Thread Christian Balzer

Hello Ben,

thanks for the quick and detailed reply.

On Mon, 10 Aug 2015 15:53:57 +0200 Ben Hutchings wrote:

 Control: severity -1 important
 Control: tag -1 upstream
 
 On Mon, 2015-08-10 at 13:52 +0900, Christian Balzer wrote:
 [...]
  I'm also not seeing this on several other machines we use for Ceph
  with the current Jessie kernel, but to be fair they use slightly
  different (QDR, not FDR) ConnectX-3 HBAs.
 
 If SR-IOV is enabled on the adapter then the ports will always operate
 in Ethernet mode as it's apparently not supported for IB.  Perhaps SR
 -IOV enabled on some adapters but not others?

I was wondering about that, but wasn't aware of the Ethernet only bit of
SR-IOV. 
Anyway, the previous cluster and one blade of this new one have Mellanox
firmware 2.30.8000, which doesn't offer the Flexboot Bios menu and thus
have no SR-IOV configuration option at boot time.

However the other blade (replacement mobo for a DoA one) in the new server
does have firmware 2.33.5100 and the Flexboot menu and had SR-IOV enabled.

Alas disabling it (and taking out the fake-install) did result in the same
behavior, mlx4_en was auto-loaded before mlx4_ib.

In all following tests I did reboot both nodes simultaneously, to avoid
having one port in Ethernet mode forcing things on the other side.

Also the newest QDR card for one of the Ceph cluster machines here does
have that firmware, but behaves properly (no mlx4_en auto-load) with the
latest Jessie kernel.
 
 If that's not the issue, it looks like you are supposed to set a module
 parameter in mlx4_core:
 port_type_array:Array of port types: HW_DEFAULT (0) is default 1 for
 IB, 2 for Ethernet (array of int) e.g.:
 options mlx4_core port_type_array=1,1
 
I added that options mlx4_core port_type_array=1 (since there is only
one port) to /etc/modprobe.d/local.conf, depmod -a, update-initramfs -u,
but no joy.
The mlx4_en module gets auto-loaded before the IB one as well with this
setting.

So ultimately only the fake-install of mlx4_en provides a workaround.

If you have anything else you would like to try let me know, this cluster
will probably not go into production for another 2 weeks.

 I don't know what determines the hardware default.
 
 [...]
  Given that the previous version works as expected and that Jessie is
  doing the right thing as well, I'd consider this a critical bug.
 
 No, it is important (since it is a regression) but it is not critical.
 
Fair enough.

  Had I rebooted the older production cluster with 500,000 users on it
  into this kernel, the results would not have been pretty.
 
 And that's why you tested on one machine first, right?
 
Of course, but it would have still a) broken things (replication stopped)
and b) taken me even more time to figure out what was going on and how to
work around it as I can't reboot that cluster willy-nilly. 

There simply is a very high expectation that a kernel update like this
won't leave you dead in the water.

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#795060: Latest Wheezy backport kernel prefers Infiniband mlx4_en over mlx4_ib, breaks existing installs

2015-08-09 Thread Christian Balzer

Package: linux-image-3.16.0-0.bpo.4-amd64
Version: 3.16.7-ckt11-1+deb8u2~bpo70+1
Severity: Critical


Hello,

We have a 2 node Supermicro chassis (2028TP-DC0FR) chassis with an onboard
Mellanox ConnecX-3 HBA in production since last year. 
Both nodes are directly connected with a QFSP FDR cable.
We use IPoIB (for DRBD) and thus load the mlx4_ib module and all the
assorted other ones in /etc/modules at boot time. 
These are Wheezy machines, currently with the 3.16.7-ckt2-1~bpo70+1 kernel.

Last week we got another (identical) one of these chassis and I installed
Wheezy as well (we need pacemaker, which is sorely lacking in Jessie).
This was with the 3.16.7-ckt11-1+deb8u2~bpo70+1 kernel and unlike in the
past it proceeded to load the mlx4_en module automatically, created an
eth2: interface and the ib0: interface was nowhere to be found.

This was not only very unexpected, I was also under the impression that 
mlx4_en and mlx4_ib could be used in parallel, but even though mlx4_ib was
loaded it did not work (the  /sys/class/net/ib0 entry was not created).

Booting into the stock Wheezy 3.2 kernel (which we also run on older
machines with ConnectX-2 HBAs) resulted in the expected behavior, IB
interface, no Ethernet. 

I'm also not seeing this on several other machines we use for Ceph with the
current Jessie kernel, but to be fair they use slightly different (QDR,
not FDR) ConnectX-3 HBAs.

After doing a fake-install (blacklisting didn't work) like this:
---
echo install mlx4_en /bin/true  /etc/modprobe.d/mlx4_en.conf 
depmod -a
update-initramfs -u
---
and rebooting I have IB running on 3.16.0-0.bpo.4-amd64 again as well.

Given that the previous version works as expected and that Jessie is
doing the right thing as well, I'd consider this a critical bug.

Had I rebooted the older production cluster with 500,000 users on it into
this kernel, the results would not have been pretty.

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#773361: ceph: osd dies, something corrupts journal

2015-06-04 Thread Christian Balzer

Hello,

see my thread in the Ceph ML named: 
OSD trashed by simple reboot (Debian Jessie, systemd?)

I believe that upgrading to 0.80.9 will fix this problem.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#768922: Bug#768618: pacemaker: FTBFS in jessie: build-dependency not installable: libqb-dev (= 0.16.0.real)

2015-01-18 Thread Christian Balzer


Well...

Meanwhile, here in what it what we tenuously call reality one can observe
the following things:

1. Pacemaker broken in Jessie for more than 2 months now.
2. Silence on this bug for more than one month.
3. Pacemaker was recently removed from Jessie.
4. The February 5th deadline is rapidly approaching, cue the laughingstock.

Between systemd and this gem Jessie is shaping up to be the best Debian
release ever...

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#754341: Definitely a python conflict

2014-07-10 Thread Christian Balzer

I just downgraded a test machine running sid by wget-ing the following
packages (unfortunately they are no longer in any package list):
---
libpython2.7_2.7.7-2_amd64.deb  
python2.7_2.7.7-2_amd64.deb
libpython2.7-minimal_2.7.7-2_amd64.deb  
python2.7-minimal_2.7.7-2_amd64.deb
libpython2.7-stdlib_2.7.7-2_amd64.deb
---
then putting them into a directory by themselves and running:
---
dpkg --install *
---

ceph (the command in any incarnation, not just ceph -s) now works again
and the OSD on that machine unsurprisingly can be started again as well.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#754341: ceph command hangs in Jessie after update (python 2.7.8 suspected)

2014-07-09 Thread Christian Balzer

Package: ceph
Version: 0.80.1-1+b1
Severity: Critical

Hello,

this is a Jessie machine, part of a Jessie based Ceph cluster. 
After doing a minor update today (40 odd packages) the ceph command hangs
just before returning to the command prompt like this:
---
# ceph -s
cluster d6b84616-ff3e-4b04-b50b-bd398d7fa69a
 health HEALTH_OK
 monmap e1: 3 mons at 
{c-admin=10.0.0.10:6789/0,ceph-01=10.0.0.41:6789/0,ceph-02=10.0.0.42:6789/0}, 
election epoch 86, quorum 0,1,2 c-admin,ceph-01,ceph-02
 osdmap e1676: 4 osds: 4 up, 4 in
  pgmap v4135542: 1152 pgs, 3 pools, 699 GB data, 182 kobjects
1358 GB used, 98819 GB / 100178 GB avail
1152 active+clean
  client io 525 kB/s wr, 132 op/s

[hangs indefinitely until ctrl-c]
---

This update include python 2.7.8 as in:
---
Preparing to unpack .../python2.7_2.7.8-1_amd64.deb ...
Unpacking python2.7 (2.7.8-1) over (2.7.7-2) ...
---

So I suspect this is an incompatibility between ceph and python.

Needless to say that this is critical, as this hanging command will
prevent normal operations, the start of monitors or OSDs, in short ruin
ones day quite effectively. 

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#729961: qemu-system-x86: rbd support

2014-04-13 Thread Christian Balzer

Hello,

On Sun, 13 Apr 2014 19:04:45 +0400 Michael Tokarev wrote:

 09.01.2014 10:53, Christian Balzer wrote:
  
  Hello,
  
  Meanwhile, we're at qemu 1.7, ceph is at 0.72, both in sid and
  wheezy-backports. 
  
  I'd really, really would love to see RBD re-enabled by default for
  these and when things have trickled into jessie. 
 
 I just removed librbd support in qemu once again, because the same old
 issue - lack of library/symbol versioning - which prevented ceph from
 going into wheezy - is _still_ not fixed.  Because once I uploaded rbd-
 enabled qemu, I received a new grave bugreport against qemu-system which
 tells me that running qemu-system binary results in dynamic linker not
 finding symbol rbd_aio_flush or some other.
 
Where is that NEW bug report? Would that be the OLD:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=679686

you merged things with?

 And now I need _really_ good reason to re-enable it again, because, well,
 guys, this is not funny at all.
 
Firstly I am running 1.7.0+dfsg-6 under Jessie and it works just fine.
As did source build versions with RBD enabled in the past, either by using
the inktank Ceph packages or since 0.72.x entered sid and then jessie.

Secondly, Bug 679686 is about something that is not true in Jessie, as it
contains Ceph 0.72.2 (at this time).
That version is not in wheezy-backports yet and for that particular case
it would be true. 
However for Jessie it most emphatically isn't.

If there is indeed a bug (new or still present) with jessie, fine.
If this is about wheezy-backports, since when do backports block packages
from entering testing?

Lastly, I will make the Ceph developers aware of this.

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#729961: qemu-system-x86: rbd support

2014-01-09 Thread Christian Balzer
On Thu, 9 Jan 2014 10:19:18 -0800 Vagrant Cascadian wrote:

 On Thu, Jan 09, 2014 at 03:53:14PM +0900, Christian Balzer wrote:
  Meanwhile, we're at qemu 1.7, ceph is at 0.72, both in sid and
  wheezy-backports. 
 
 It doesn't appear to be in wheezy-backports, and is still having troubles
 migrating to jessie/testing:

Argh, my bad. I clearly had too many parallel browser windows open
yesterday when researching this and not the wheezy-backport but the sid
one for ceph. ^_^;;
 
 rmadison ceph
  ceph | 0.48-2   | jessie | source, amd64, armel, armhf, i386, ia64,
 mips, mipsel, powerpc, s390x, sparc ceph | 0.48-2   | sid| source
  ceph | 0.72.2-1 | sid| source, amd64, armel, armhf, i386, ia64,
 mips, mipsel, powerpc, s390x, sparc
 
 grep-excuses ceph
 ceph (0.48-2 to 0.72.2-1)
 Maintainer: Ceph Maintainers
 7 days old (needed 5 days)
 out of date on i386: ceph-fuse, ceph-fuse-dbg (from 0.48-2)
 ...
 out of date on ia64: ceph-fuse, ceph-fuse-dbg (from 0.48-2) (but
 ia64 isn't keeping up, so nevermind) Updating ceph fixes old bugs:
 #705262 Not considered
 Depends: ceph google-perftools (not considered)
 
 
 Doesn't really seem like it's ready yet...
 
Unfortunately, though nobody in their right mind would use ceph-fuse
anyway. ^o^

Well, lets hope this gets resolved in time for jessie, would be a shame
otherwise. 

Thanks,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#729961: qemu-system-x86: rbd support

2014-01-08 Thread Christian Balzer

Hello,

Meanwhile, we're at qemu 1.7, ceph is at 0.72, both in sid and
wheezy-backports. 

I'd really, really would love to see RBD re-enabled by default for these
and when things have trickled into jessie. 

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#719675: [Pkg-libvirt-maintainers] Bug#719675: Live migration of KVM guests fails if it takes more than 30 seconds (large memory guests)

2013-08-15 Thread Christian Balzer
On Thu, 15 Aug 2013 08:16:02 +0200 Guido Günther wrote:

 On Thu, Aug 15, 2013 at 09:35:09AM +0900, Christian Balzer wrote:
  On Wed, 14 Aug 2013 21:50:22 +0200 Guido Günther wrote:
  
   On Wed, Aug 14, 2013 at 04:49:42PM +0900, Christian Balzer wrote:

Package: libvirt0
Version: 0.9.12-11+deb7u1
Severity: important

Hello,

when doing a live migration using Pacemaker (the OCF VirtualDomain
RA) on a cluster with DRBD (active/active) backing storage
everything works fine with recently started (small memory
footprint of about 200MB at most) KVM guests. 

After inflating one guest to 2GB memory usage (memtester comes in
handy for that) the migration failed after 30 seconds, having
managed to migrate about 400MB in that time over the direct,
dedicated GbE link between my test cluster host nodes. 

libvirtd.log on the migration target node, migration start time is
07:24:51 :
---
2013-08-13 07:24:51.807+: 31953: warning :
qemuDomainObjEnterMonitorInternal :994 : This thread seems to be
the async job owner; entering monitor without ask ing for a nested
job is dangerous 2013-08-13 07:24:51.886+: 31953: warning :
qemuDomainObjEnterMonitorInternal :994 : This thread seems to be
the async job owner; entering monitor without ask ing for a nested
job is dangerous 2013-08-13 07:24:51.888+: 31953: warning :
qemuDomainObjEnterMonitorInternal :994 : This thread seems to be
the async job owner; entering monitor without ask ing for a nested
job is dangerous 2013-08-13 07:24:51.948+: 31953: warning :
qemuDomainObjEnterMonitorInternal :994 : This thread seems to be
the async job owner; entering monitor without ask ing for a nested
job is dangerous 2013-08-13 07:24:51.948+: 31953: warning :
qemuDomainObjEnterMonitorInternal :994 : This thread seems to be
the async job owner; entering monitor without ask ing for a nested
job is dangerous 2013-08-13 07:25:21.217+: 31950: warning :
virKeepAliveTimer:182 : No response from client 0x1948280 after 5
keepalive messages in 30 seconds 2013-08-13 07:25:31.224+:
31950: warning : qemuProcessKill:3813 : Timed out waiting after
SIGTERM to process 15926, sending SIGKILL
   
   This looks more like you're not replying via the keepalive protocol.
   What are you using to migrate VMs?
-- Guido
   
  As I said up there, the Pacemaker (heartbeat, OCF really) resource
  agent, with SSH as transport (and only) option. 
 
 This is not telling me how this is done within pacemaker. RHCS used to
 do this with virsh  internally. I'll check the sources once I get around
 to.

Sorry, I was assuming some familiarity with this resource agent.
It indeed creates a virsh command line internally, the relevant code for
this case is basically:
---
# Find out the remote hypervisor to connect to. That is, turn
# something like qemu://foo:/system into
# qemu+tcp://bar:/system
if [ -n ${OCF_RESKEY_migration_transport} ]; then
transport_suffix=+${OCF_RESKEY_migration_transport}
fi
---
The above defines the transport, ssh in my case.
And then later:
---
# Scared of that sed expression? So am I. :-)
remoteuri=$(echo ${OCF_RESKEY_hypervisor} | sed -e 
s,\(.*\)://[^/:]*\(:\?[0-9]*\)/\(.*\),\1${transport_suffix}://${target_node}\2/\3,)

# OK, we know where to connect to. Now do the actual migration.
ocf_log info $DOMAIN_NAME: Starting live migration to ${target_node} 
(using remote hypervisor URI ${remoteuri} ${migrateuri}).
virsh ${VIRSH_OPTIONS} migrate --live $DOMAIN_NAME ${remoteuri} 
${migrateuri}
rc=$?
---
In my case the migrateuri is empty as I didn't define anything, I thus left
out the code that would potentially define it.

Hope that helps,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#719675: Live migration of KVM guests fails if it takes more than 30 seconds (large memory guests)

2013-08-14 Thread Christian Balzer

Package: libvirt0
Version: 0.9.12-11+deb7u1
Severity: important

Hello,

when doing a live migration using Pacemaker (the OCF VirtualDomain RA) on
a cluster with DRBD (active/active) backing storage everything works fine
with recently started (small memory footprint of about 200MB at most) KVM
guests. 

After inflating one guest to 2GB memory usage (memtester comes in handy
for that) the migration failed after 30 seconds, having managed to migrate
about 400MB in that time over the direct, dedicated GbE link between my
test cluster host nodes. 

libvirtd.log on the migration target node, migration start time is
07:24:51 :
---
2013-08-13 07:24:51.807+: 31953: warning : qemuDomainObjEnterMonitorInternal
:994 : This thread seems to be the async job owner; entering monitor without ask
ing for a nested job is dangerous
2013-08-13 07:24:51.886+: 31953: warning : qemuDomainObjEnterMonitorInternal
:994 : This thread seems to be the async job owner; entering monitor without ask
ing for a nested job is dangerous
2013-08-13 07:24:51.888+: 31953: warning : qemuDomainObjEnterMonitorInternal
:994 : This thread seems to be the async job owner; entering monitor without ask
ing for a nested job is dangerous
2013-08-13 07:24:51.948+: 31953: warning : qemuDomainObjEnterMonitorInternal
:994 : This thread seems to be the async job owner; entering monitor without ask
ing for a nested job is dangerous
2013-08-13 07:24:51.948+: 31953: warning : qemuDomainObjEnterMonitorInternal
:994 : This thread seems to be the async job owner; entering monitor without ask
ing for a nested job is dangerous
2013-08-13 07:25:21.217+: 31950: warning : virKeepAliveTimer:182 : No 
response from client 0x1948280 after 5 keepalive messages in 30 seconds
2013-08-13 07:25:31.224+: 31950: warning : qemuProcessKill:3813 : Timed out 
waiting after SIGTERM to process 15926, sending SIGKILL
---

Below is the only thing I could find which is somewhat related to this,
unfortunately it was cured by the miracle that is the next version upgrade
without the root cause being found:
https://bugzilla.redhat.com/show_bug.cgi?format=multipleid=816451

I will install Sid on another test cluster tomorrow and am betting that it
will work just fine there. 
Since Testing is still at the same level as Wheezy I'm also betting that
we won't see anything in wheezy-backports anytime soon.
I'd really rather not create a production cluster based on Jessie or do
those rather complex backports myself...


Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#719675: [Pkg-libvirt-maintainers] Bug#719675: Live migration of KVM guests fails if it takes more than 30 seconds (large memory guests)

2013-08-14 Thread Christian Balzer
On Wed, 14 Aug 2013 21:50:22 +0200 Guido Günther wrote:

 On Wed, Aug 14, 2013 at 04:49:42PM +0900, Christian Balzer wrote:
  
  Package: libvirt0
  Version: 0.9.12-11+deb7u1
  Severity: important
  
  Hello,
  
  when doing a live migration using Pacemaker (the OCF VirtualDomain RA)
  on a cluster with DRBD (active/active) backing storage everything
  works fine with recently started (small memory footprint of about
  200MB at most) KVM guests. 
  
  After inflating one guest to 2GB memory usage (memtester comes in handy
  for that) the migration failed after 30 seconds, having managed to
  migrate about 400MB in that time over the direct, dedicated GbE link
  between my test cluster host nodes. 
  
  libvirtd.log on the migration target node, migration start time is
  07:24:51 :
  ---
  2013-08-13 07:24:51.807+: 31953: warning :
  qemuDomainObjEnterMonitorInternal :994 : This thread seems to be the
  async job owner; entering monitor without ask ing for a nested job is
  dangerous 2013-08-13 07:24:51.886+: 31953: warning :
  qemuDomainObjEnterMonitorInternal :994 : This thread seems to be the
  async job owner; entering monitor without ask ing for a nested job is
  dangerous 2013-08-13 07:24:51.888+: 31953: warning :
  qemuDomainObjEnterMonitorInternal :994 : This thread seems to be the
  async job owner; entering monitor without ask ing for a nested job is
  dangerous 2013-08-13 07:24:51.948+: 31953: warning :
  qemuDomainObjEnterMonitorInternal :994 : This thread seems to be the
  async job owner; entering monitor without ask ing for a nested job is
  dangerous 2013-08-13 07:24:51.948+: 31953: warning :
  qemuDomainObjEnterMonitorInternal :994 : This thread seems to be the
  async job owner; entering monitor without ask ing for a nested job is
  dangerous 2013-08-13 07:25:21.217+: 31950: warning :
  virKeepAliveTimer:182 : No response from client 0x1948280 after 5
  keepalive messages in 30 seconds 2013-08-13 07:25:31.224+: 31950:
  warning : qemuProcessKill:3813 : Timed out waiting after SIGTERM to
  process 15926, sending SIGKILL
 
 This looks more like you're not replying via the keepalive protocol.
 What are you using to migrate VMs?
  -- Guido
 
As I said up there, the Pacemaker (heartbeat, OCF really) resource agent,
with SSH as transport (and only) option. 
So the resulting migration URI should be something like:

qemu+ssh://targethost/system

Of course with properly distributed authorized_keys, again this works just
fine with a small enough guest.

If there wasn't a proper two-way communication going on, shouldn't the
migration fail from the start?

[snip]

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#576901: init.d script fails under Squeeze with insserv due to lack of run level definitions

2010-05-13 Thread Christian Balzer

Oh, one more thing to make it _really_ work. ^.^
Before insserv times drbd was started before HA (heartbeat) and stopped
after it. Without the additional X- lines below it will stop in parallel
with HA and thus lead to all kinds of nastiness and fireworks. Please
consider this for your final fix or alternatively get the HA people to
include drbd in their Should- sections.

---
### BEGIN INIT INFO
# Provides: drbd
# Required-Start:   $local_fs $network $syslog
# Required-Stop:$local_fs $network $syslog
# Should-Start: sshd multipathd
# Should-Stop:  sshd multipathd
# X-Start-Before:   HA
# X-Stop-After: HA
# Default-Start:2 3 4 5
# Default-Stop: 0 1 6
# Short-Description:Control drbd resources.
### END INIT INFO
---

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#576901: init.d script fails under Squeeze with insserv due to lack of run level definitions

2010-05-09 Thread Christian Balzer

Hallo,

On Sun, 9 May 2010 20:31:41 +0200 gregor herrmann wrote:
 On Thu, 08 Apr 2010 13:39:25 +0900, Christian Balzer wrote:
 
  Package: drbd8-utils 
  Version: 2:8.3.7-1
  Severity: serious
  
  This incarnation of drbd8-utils has missing run level definitions in
  the INIT INFO section of the init.d script and 
 
 Hm, 2:8.3.7-1 seems to have the header (cf. also debian/changelog),
 did you mean to report this bug against older versions?
 
Nope, this was a fresh Squeeze install. Since my Sid test servers have
been upgraded from at least Etch times back and are thus not really
trustworthy in this regard I just installed drbd8-utils on a fresh Sid box
and still the same result:
---
### BEGIN INIT INFO
# Provides: drbd
# Required-Start: $local_fs $network $syslog
# Required-Stop:  $local_fs $network $syslog
# Should-Start:   sshd multipathd
# Should-Stop:sshd multipathd
# Default-Start:
# Default-Stop:
# Short-Description:Control drbd resources.
---

No Default-Start or Stop definitions...

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#576901: init.d script fails under Squeeze with insserv due to lack of run level definitions

2010-04-07 Thread Christian Balzer
Package: drbd8-utils 
Version: 2:8.3.7-1
Severity: serious

This incarnation of drbd8-utils has missing run level definitions in the
INIT INFO section of the init.d script and thus does not get included when
insserv does its magic. Read, drbd is never started during bootup on
Squeeze. 
Changing the void in Default-Start/Stop to the levels below and running
insserv drbd again fixed things:

### BEGIN INIT INFO
# Provides: drbd
# Required-Start:   $local_fs $network $syslog
# Required-Stop:$local_fs $network $syslog
# Should-Start: sshd multipathd
# Should-Stop:  sshd multipathd
# Default-Start:2 3 4 5
# Default-Stop: 0 1 6
# Short-Description:Control drbd resources.
### END INIT INFO


Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#553503: And another one

2009-12-03 Thread Christian Balzer

Hello,

definitely seems to be happening around group_list, that looks rather
messed up down there.

---
batzmaru:~# gdb -c /tmp/exim4.core.1259874378.25229
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show copying
and show warranty for details.
This GDB was configured as x86_64-linux-gnu.
(no debugging symbols found)
Core was generated by `/usr/sbin/exim4 -Mc 1NGIsU-0006Yu-NU'.
Program terminated with signal 11, Segmentation fault.
[New process 25229]
#0  0x0041e1fa in ?? ()
(gdb) bt full
#0  0x0041e1fa in ?? ()
No symbol table info available.
(gdb) symbol-file /usr/lib/debug/usr/sbin/exim4
Reading symbols from /usr/lib/debug/usr/sbin/exim4...done.
(gdb) bt full
#0  exim_setugid (uid=0, gid=103, igflag=0,
msg=0x49d0b2 Address 0x49d0b2 out of bounds) at exim.c:539
euid = value optimized out
egid = value optimized out
#1  0x0042213a in main (argc=3, cargv=0x7fffa93b4b08) at exim.c:3200
arg_receive_timeout = -1
arg_smtp_receive_timeout = -1
arg_error_handling = 0
filter_sfd = value optimized out
filter_ufd = value optimized out
group_count = 1
i = 0
list_queue_option = 0
msg_action = 0
msg_action_arg = 2
namelen = value optimized out
queue_only_reason = value optimized out
perl_start_option = 0
recipients_arg = 3
sender_address_domain = 0
test_retry_arg = -1
test_rewrite_arg = -1
arg_queue_only = 0
bi_option = 0
checking = 0
count_queue = 0
expansion_test = 0
extract_recipients = 0
forced_delivery = 0
f_end_dot = 0
deliver_give_up = 0
list_queue = 0
list_options = 0
local_queue_only = value optimized out
more = value optimized out
one_msg_action = 0
queue_only_set = 0
sender_ident_set = 0
session_local_queue_only = value optimized out
unprivileged = 0
removed_privilege = value optimized out
usage_wanted = value optimized out
verify_address_mode = 0
verify_as_sender = 0
version_printed = 0
alias_arg = (uschar *) 0x0
called_as = (uschar *) 0x4be89f Address 0x4be89f out of bounds
start_queue_run_id = (uschar *) 0x0
stop_queue_run_id = (uschar *) 0x0
expansion_test_message = (uschar *) 0x0
ftest_domain = (uschar *) 0x0
ftest_localpart = (uschar *) 0x0
ftest_prefix = (uschar *) 0x0
ftest_suffix = (uschar *) 0x0
real_sender_address = value optimized out
originator_home = value optimized out
reset_point = value optimized out
pw = (struct passwd *) 0x6d15a0
statbuf = {st_dev = 13, st_ino = 574, st_nlink = 1, st_mode = 8630,
  st_uid = 0, st_gid = 0, pad0 = 0, st_rdev = 259, st_size = 0,
  st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1259088058,
tv_nsec = 506449560}, st_mtim = {tv_sec = 1259088058,
tv_nsec = 506449560}, st_ctim = {tv_sec = 1259088058,
tv_nsec = 634448800}, __unused = {0, 0, 0}}
passed_qr_pid = 0
passed_qr_pipe = -1
group_list = {103, 0 repeats 62757 times, 2847059989, 32767, 0, 0,
  2849139040, 32767, 2771834999, 32767, 2847034884, 32767,
  0 repeats 16 times, 1, 0 repeats 33 times, 2847059989, 32767, 0, 0,
  2849139040, 32767, 2773942719, 32767, 2847034884, 32767,
  0 repeats 16 times, 1, 0 repeats 41 times, 2847059989, 32767, 0, 0,
  2849139040, 32767, 2776079430, 32767, 2847034884, 32767,
  0 repeats 16 times, 1, 0 repeats 41 times, 2847059989, 32767, 0, 0,
  2849139040, 32767, 2778334571, 32767, 2847034884, 32767,
  0 repeats 16 times, 1, 0 repeats 29 times, 2839234176, 32767,
  2839234288, 32767, 40, 0, 2773935248, 32767, 0, 0, 2848057752, 32767,
  2847054253, 32767, 0, 0, 2847034884, 32767, 0, 0, 2847056758, 32767,
  2839234176, 32767, 2847054192, 32767, 2839234239, 32767, 2839234224, 32767,
  2839234216, 32767, 2849217336, 32767, 1, 0, 0, 0, 0, 0, 2771834999, 32767,
  2380267520, 4294922870, 40, 0, 2773935248, 32767, 0, 0, 2848057752, 32767,
  1077936128, 4294922870, 1186332672, 4294923109, 0, 0, 0, 0, 2839234176,
  32767, 2839234288, 32767, 0...}
rsopts = Cannot access memory at address 0x49e740
---

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#553503: Got a core

2009-12-02 Thread Christian Balzer

Hello there,

I hope I've done the backtrace correctly, looks not so useful to me at
least with all the values optimized out. I got the core here so if somebody
can clue me in about how to get more info out of gdb, please do so.
---
mb14:~# gdb  -se /usr/lib/debug/usr/sbin/exim4 -c /tmp/exim4.core.1259801287.395
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show copying
and show warranty for details.
This GDB was configured as x86_64-linux-gnu...

warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
Core was generated by `/usr/sbin/exim4 -Mc 1NFzrb-6M-DK'.
Program terminated with signal 11, Segmentation fault.
[New process 395]
#0  main (argc=3, cargv=0x7fff949d3d28) at exim.c:1296
1296exim.c: No such file or directory.
in exim.c
(gdb) bt
#0  main (argc=3, cargv=0x7fff949d3d28) at exim.c:1296
(gdb) bt full
#0  main (argc=3, cargv=0x7fff949d3d28) at exim.c:1296
arg_receive_timeout = 0
arg_smtp_receive_timeout = 4812357
arg_error_handling = value optimized out
filter_sfd = value optimized out
filter_ufd = value optimized out
group_count = value optimized out
i = value optimized out
list_queue_option = value optimized out
msg_action = value optimized out
msg_action_arg = value optimized out
namelen = 0
queue_only_reason = value optimized out
perl_start_option = value optimized out
recipients_arg = value optimized out
sender_address_domain = value optimized out
test_retry_arg = value optimized out
test_rewrite_arg = value optimized out
arg_queue_only = value optimized out
bi_option = value optimized out
checking = value optimized out
count_queue = value optimized out
expansion_test = value optimized out
extract_recipients = value optimized out
forced_delivery = value optimized out
f_end_dot = value optimized out
deliver_give_up = value optimized out
list_queue = value optimized out
list_options = value optimized out
local_queue_only = value optimized out
more = value optimized out
one_msg_action = value optimized out
queue_only_set = value optimized out
sender_ident_set = value optimized out
session_local_queue_only = value optimized out
unprivileged = value optimized out
removed_privilege = value optimized out
usage_wanted = value optimized out
verify_address_mode = value optimized out
verify_as_sender = value optimized out
version_printed = value optimized out
alias_arg = value optimized out
called_as = value optimized out
start_queue_run_id = value optimized out
stop_queue_run_id = value optimized out
expansion_test_message = value optimized out
ftest_domain = value optimized out
ftest_localpart = value optimized out
ftest_prefix = value optimized out
ftest_suffix = value optimized out
real_sender_address = value optimized out
originator_home = value optimized out
reset_point = value optimized out
pw = value optimized out
statbuf = {st_dev = 140735686720624, st_ino = 140735686720568,
  st_nlink = 4131212846, st_mode = 2493332512, st_uid = 32767, st_gid = 0,
  pad0 = 0, st_rdev = 140735751927214, st_size = 0,
  st_blksize = 140735752954696, st_blocks = 140733193388033, st_atim = {
tv_sec = 0, tv_nsec = 140733193388033}, st_mtim = {
tv_sec = 140735754040496, tv_nsec = 37}, st_ctim = {tv_sec = 4294967295,
tv_nsec = 6797900016}, __unused = {140735752954696, 140735754093400,
140735686720672}}
passed_qr_pid = value optimized out
passed_qr_pipe = value optimized out
group_list = Cannot access memory at address 0x7fff94993960
---

The core itself is 1.2MB, I could attach it to a mail or put it someplace
accessible as well if that helps.

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#553503: Confirmed here as well, suspect kernel interaction

2009-11-17 Thread Christian Balzer

Package: exim4
Version: 4.69-9
Followup-For: Bug #553503

I have been seeing this for a while, esp. when stress testing 2 new
mailbox servers here. There seem to be no lost mails and the segfault
below happened during a queue run (-q2m) on an pretty idle box, with no
mails being delivered for hours beforehand. Which pretty much rules out
corrupted database files. Seen this with both heavy and light deamon
flavors. No problems on the remaining Etch machines.
---
Nov 17 20:57:35 mb14 kernel: [2956900.869830] exim4[15676]: segfault at 
7fff6f022504 ip 0041e95c sp 7fff6f0224d0 error 6 in 
exim4[40+c8000]
---

My architecture is as with the other 2 reporters amd64 (x86_64) on 2-8
core SMP machines, all running Lenny.  
I use custom kernels, predominantly 2.6.27.latest or 2.6.latest. 
However one decently busy machine running a 2.6.24.7 kernel NEVER
exhibited this problem, leading me to suspect that there some interaction
with a kernel feature after 2.6.24 and exim 4.69-9.

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#508135: 1.4.9a-3 security upgrade breaks (with) plugins

2008-12-08 Thread Christian Balzer

Hello Thijs,

On Mon, 8 Dec 2008 13:02:59 +0100 (CET) Thijs Kinkhorst wrote:

[...]
  Only after disabling the check quota plugin as well (which does/did NOT
  require any patching nor has seen any updates since the original
  install) full rendering was restored.
 
 I cannot reproduce this with a plain installation of the squirrelmail
 Debian package and adding the check_quota plugin to that. The fact that
 you get a completely blank page, suggests to me that you get PHP errors
 but that display_errors is turned off. 
Indeed it was and I even did add a nice log file for php errors when this
change was forced upon us. And obviously promptly forgot about it, too. 

 I suggest you look into your error
 log to see what the PHP errors are, and whether they are caused by the
 change that the security update brought. Please let me know what you
 find.
 
This is what I found:
[08-Dec-2008 22:03:39] PHP Fatal error:  Call to undefined function 
sq_change_text_domain() in 
/usr/share/squirrelmail/plugins/check_quota/functions.php on line 761
[08-Dec-2008 22:03:39] PHP Fatal error:  Call to undefined function 
get_current_hook_name() in 
/usr/share/squirrelmail/plugins/check_quota/functions.php on line 149

The first one is in the folder frame, the 2nd in the message frame.
Could it be that  
Yes, of course. The evil compatibility plugin raises it's rather
functional head again. After re-applying that patch things are working
again. It might be a very very good idea to mention re-running all
patches needed by plugins and explicitly the compatibility one during
installation. Esp. the compatibility one tends to be rather invisible.

Case closed and another reason to pine for 1.5 I guess. ^^

  Tough choice between an insecure or a crippled webmail interface
  here...
 
 That seems to be a false dilemma, since you could as well disable the
 specific plugin causing the trouble.
 
Oh, I disabled it allrite. ^^ The problem is of course that these 2
plugins are rather essential to the functionality we strive to provide
here. 

Thanks for looking into this, the display_errors was all I needed to
figure it out.

Christian
-- 
Christian BalzerNetwork/Systems EngineerNOC
[EMAIL PROTECTED]   Global OnLine Japan/Fusion Network Services
http://www.gol.com/
https://secure3.gol.com/mod-pl/ols/index.cgi/?intr_id=F-2ECXvzcr6656



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#508135: 1.4.9a-3 security upgrade breaks (with) plugins

2008-12-07 Thread Christian Balzer
Package: Squirrelmail
Version: 1.4.9a-3

Hello,

That will teach me to install security updates in a timely manner. ;-)

Standard SM installation with 3 plugins enabled:

$plugins[0] = 'select_language';
$plugins[1] = 'spam_buttons';
$plugins[2] = 'check_quota';

After the upgrade the select language plugin still works and logging in
works fine. Alas nothing is rendered in the folder frame nor the message
frame. Just the menus are present. Selecting the display preferences
results in a long wait and ultimately a totally blank browser window.

Now spam buttons requires a minor patch so some breakage was to be
expected but nothing of this scale. Disabling spam buttons made the
display preferences menu work again, but still no love from either folder
or message frame. 

Only after disabling the check quota plugin as well (which does/did NOT
require any patching nor has seen any updates since the original install)
full rendering was restored.

Tough choice between an insecure or a crippled webmail interface here...

Regards,

Christian
-- 
Christian BalzerNetwork/Systems EngineerNOC
[EMAIL PROTECTED]   Global OnLine Japan/Fusion Network Services
http://www.gol.com/
https://secure3.gol.com/mod-pl/ols/index.cgi/?intr_id=F-2ECXvzcr6656



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#495511: drbd-utils needs to be heartbeat aware

2008-08-18 Thread Christian Balzer
Package: drbd8-utils 
Version: 2:8.0.13-1

Hello,

this really applies to all versions, but I guess getting it fixed in
sid/lenny will be the way to do it.

Every time the drbd8-util package gets updated it happily smashes the
ownership and protection of drbdsetup and drbdmeta needed to work with
heartbeat (dopd). You know, these happy messages from the cluster resource
manager after upgrading drbd8-utils:
---
You are using the 'drbd-peer-outdater' as outdate-peer program.   
If you use that mechanism the dopd heartbeat plugin program needs to be able to 
call drbdsetup and drbdmeta with root privileges.
You need to fix this with these commands:
chgrp haclient /sbin/drbdsetup   
chmod o-x /sbin/drbdsetup   
chmod u+s /sbin/drbdsetup
chgrp haclient /sbin/drbdmeta   
chmod o-x /sbin/drbdmeta   
chmod u+s /sbin/drbdmeta
---

I'd reckon the majority of serious drbd users utilize heartbeat to manage
their drbd resources and thus are potentially subject to some rude
awakening if they are not aware of this in advance.

Solutions would be either a debconf option to set these ownerships and
protections all the time to the correct values or to check the state of
these 2 binaries in pre-inst and then and reapply the same settings in
post-inst. 

Of course the current post-install behavior of trying to stop the drbd
resources and reload the module are also not very cooperative (or
successful) with heartbeat on top of things and the resource likely to be
mounted. Printing out dire warnings about wanting matching drbd module and
util versions is one thing (the upstream drbd maintainer btw stated that
running a higher version util with a lower version module should be safe)
but trying to pull the rug out from under a running system is... rude. ;) 

Regards,

Christian
-- 
Christian BalzerNetwork/Systems EngineerNOC
[EMAIL PROTECTED]   Global OnLine Japan/Fusion Network Services
http://www.gol.com/
https://secure3.gol.com/mod-pl/ols/index.cgi/?intr_id=F-2ECXvzcr6656



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#304735: slapd 2.2.23 database corruption

2005-04-18 Thread Christian Balzer

Hello,

Monday is nearly over here and neither today nor over the weekend
any corruption or inconsistencies were observed (and I checked
each record that was modified in the last 3 days). 

So using BDB instead of LDBM indeed seems to have fixed things for
me. 

I guess the choice as far as the Debian package is concerned is
now to either get a working LDBM backend from upstream or forcibly
migrate users away from LDBM when Sarge hits the limelight...


Even with the default 256KB cache of BDB things worked quite well
and db_stat -m showed pretty nice cache hit rates.
For the record and in case somebody wants to use this data, my
DB_CONFIG now reads like this (after many tests on my test server):
---
set_cachesize 0 134217728 1
set_flags DB_LOG_AUTOREMOVE
set_flags DB_TXN_NOSYNC
---
Yes, these servers have 2GB RAM and so I was very generous with the
cache. It helps quite a bit, that alone made full load with ldapadd
6 times faster. The DB_TXN_NOSYNC speeds that up another 8 times,
so instead of 53 minutes it takes 1 minute to load the entire LDIF.
Inserting it with slapcat -q now takes 22 seconds, I'm reminded of
the god ole ldif2ldbm days. 
I know that DB_LOG_AUTOREMOVE doesn't work the way it should for the 
moment, but here's hoping for the future. ;)

I'm unsure about DB_TXN_NOSYNC in production, basically only writing
out changes when the server gets shut down is somewhat hair raising.
OTOH it speeds up things and I never had either slapd or the whole
server crash. In which case I could create a good instance in the
22 seconds mentioned up there.

Regards,

Christian Balzer
-- 
Christian BalzerNetwork/Systems EngineerNOC
[EMAIL PROTECTED]   Global OnLine Japan/Fusion Network Services
http://www.gol.com/



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#304735: slapd 2.2.23 database corruption

2005-04-15 Thread Christian Balzer

Steve Langasek wrote:
On Fri, Apr 15, 2005 at 01:30:33PM +0900, Christian Balzer wrote:
[backend used]
 See above, LDBM (whatever actual DB that defaults to these days).

Sorry, I missed that.  I would strongly encourage you to switch to BDB,
which is the recommended backend for OpenLDAP 2.2; LDBM was more stable in
2.1 because BDB itself was *un*stable, but in 2.2, BDB is reportedly quite
solid whereas LDBM is less stable than it had been in 2.1.

Seeing that it hardly can get worse (I have been running BDB on a test
machine and that worked for the limited exposure it has), I changed the
2 servers over to BDB, something that I would have not done w/o the -q
switch in slapadd (all those BDB log files otherwise, argh).

I will monitor this over the weekend and see if the problem persists,
goes away or (heavens forbid) mutates. 

Not matter the outcome of this though, the severity of this bug report
remains the same. Right now anybody with a working sarge or woody
LDAP installation will find themselves encountering mysterious 
heisenbugs when upgrading to 2.2.23-1 (at the very least when using
LDBM). So unless the underlying problem can be fixed or the update
somehow enforces (it didn't even suggest it) BDB usage (always 
assuming this actually fixes what I'm seeing here) we have a major
show stopper.

 I loathe BDB for the times it takes for massive adds/modifies.
 Even with slapadd, which takes about 2 minutes to load the entire DB
 using ldbm as backend, but about 50 minutes with BDB.

OpenLDAP 2.2 includes a '-q' option to slapadd that makes the load time much
quicker by disabling checks that are unnecessary while loading a fresh db.
This option will be enabled by default on database reloads in the slapd
install scripts.

This sure helps (helped in my case) with a fresh load. I still dread to
see BDB performance in case I have something modifying or adding a large 
number of entries in normal (ldapmodify) operation.
It tends to be about 2 times slower than LDBM with that.

Regards,

Christian Balzer
-- 
Christian BalzerNetwork/Systems EngineerNOC
[EMAIL PROTECTED]   Global OnLine Japan/Fusion Network Services
http://www.gol.com/



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#304735: slapd 2.2.23 database corruption

2005-04-15 Thread Christian Balzer

Hello,

just a quick reply to the 3 mails from Torsten.

a) will try to ride this out with BDB and slapd 2.2.23 for the moment 
and make the call if this is working or not on Monday. So far no 
corruption, but also just a few modify actions. If it fails as well, 
I might indeed need an old package. ;P

b) I know of the DB_CONFIG stuff from other encounter with BDB (INN 
overview) and the test runs with it for slapd. It gives me headaches,
but I'll look at it again. The slapd.conf cachesize is set to 100
and the servers are vastly overspec'ed in all aspects. So no problems
thus far.

c) the -q did indeed help (2 minutes instead of 43) because it suppressed
those pesky log.01 files which really kill the BDB performance in
this scenario. 

Regards,

Christian Balzer
-- 
Christian BalzerNetwork/Systems EngineerNOC
[EMAIL PROTECTED]   Global OnLine Japan/Fusion Network Services
http://www.gol.com/



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#304735: slapd 2.2.23 database corruption

2005-04-14 Thread Christian Balzer

Package: slapd
Version: 2.2.23-1 (sarge)
Severity: critical


This is basically the same as #303826 (why this got classified as normal 
and 2.2.23 got pushed into sarge is beyond me).

I have a LARGE (60k users) users/mailsettings database in LDAP, 
on two identical servers running sarge. 
They have been rock stable like that for over a year. 
Changes are generated by ldifsort.pl/ldifdiff.pl and then applied with 
ldapmodify for a low impact and smooth operation, using ldbm as
backend.

Since the update of slapd in sarge 2 days ago I have been getting
an increasing number of reports of user settings vanishing from
the system. As with #303826 a full dump of the DB WILL show that 
these records are present, but a specific search for them will fail.
So this hints very much at index corruption of some sort, as a
stop/start of slapd does not change things. However a delete/add 
of that entire record tends to fix things and so far it seems only
records that were touched with modify have been affected.
Unfortunately this is not deterministic in the least, while one 
slapd instance on one server will happily return the correct data 
for a specific query the other one might not or vice versa.

I urge you (in case this can't be fixed in a time frame of 1-2 days)
to back out this update and revert to the previous version.

If this LDAP DB would be the canonical one and not fed from a SQL
DB, I'd be out of a job by now instead of frantically fixing things
with good data.

Caffeinated Greetings,

Christian Balzer
-- 
Christian BalzerNetwork/Systems EngineerNOC
[EMAIL PROTECTED]   Global OnLine Japan/Fusion Network Services
http://www.gol.com/



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#304735: slapd 2.2.23 database corruption

2005-04-14 Thread Christian Balzer

Steve Langasek wrote:

On Fri, Apr 15, 2005 at 12:09:19PM +0900, Christian Balzer wrote:

 Changes are generated by ldifsort.pl/ldifdiff.pl and then applied with
 ldapmodify for a low impact and smooth operation, using ldbm as
 backend.

The previous version of slapd *also* had corruption issues, and this is the
driving reason for putting slapd 2.2 in sarge.

I read that and I'm all for using current versions of software when
getting near to a Debian release.

Alas it's hard to contrast one year of trouble free operation with
the current state of affairs. A fix that breaks all the users which
until now had a perfectly working setup is, well, not a fix.
Or to put it quite blunt, people encountering DB corruption with
the previous version most likely did NOT run production systems with it.
Me and others on the other hand...

Which LDAP backend are you using for this directory?

See above, LDBM (whatever actual DB that defaults to these days).

I loathe BDB for the times it takes for massive adds/modifies.
Even with slapadd, which takes about 2 minutes to load the entire DB
using ldbm as backend, but about 50 minutes with BDB.

Regards,

Christian Balzer
-- 
Christian BalzerNetwork/Systems EngineerNOC
[EMAIL PROTECTED]   Global OnLine Japan/Fusion Network Services
http://www.gol.com/



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]