Re: Script vs command line behaviour

2016-10-15 Thread Henrique de Moraes Holschuh
On Sat, 15 Oct 2016, Andre Majorel wrote:
> Or is pkill more than a wrapper around kill(pid, 15) and
> kill(pid, 9) ?

pkill is quite less prone to killing the wrong process due to a race
when you use it properly.

In the general case one should avoid SIGKILL.  Trying other signals that
allow for orderly exit first is extremely recommended.  SIGTERM being
the typical one you'd want.  There is also SIGQUIT, but too many
programers get it wrong and trap SIGTERM for clean exit, but forget to
actually trap SIGQUIT as well.

> I used to have a user who had the bad habit of indiscriminately
> going straight for SIGKILL. It was annoying. But not as much as
> the fact that SIGKILL can't do anything about processes stuck in
> "D" state. I really wish I knew a way to get rid of these.

Processes are stuck in "D" state when the kernel is doing something on
their behalf.

State "D" is not really "waiting for IO" as I have seen people describe
it, it would be more correct to call it "in the middle of a syscall".
"waiting for IO" is just a subset of the reasons why a process can get
stuck in "D" state... even if it is the most common one (AFAIK).

-- 
  Henrique Holschuh



Re: Script vs command line behaviour

2016-10-15 Thread Andre Majorel
On 2016-10-12 08:40 -0400, Greg Wooledge wrote:
> On Wed, Oct 12, 2016 at 09:34:22PM +0900, Mark Fletcher wrote:
> > # The systemctl stop for svnserve may not work as I haven't got around to 
> > # making a stop script for it.
> > # So kill the process the old fashioned way
> > ps -ef | grep svnserve | grep -v grep | awk '{print $2}' | xargs kill -9
> 
> Please consider replacing this with some variant of:
> 
> pkill svnserve
> 
> And stop using -9 (SIGKILL).  Forever.  Pretend it never existed.

Perhaps Greg means : try "kill " before "kill -9 ".
Or is pkill more than a wrapper around kill(pid, 15) and
kill(pid, 9) ?

I used to have a user who had the bad habit of indiscriminately
going straight for SIGKILL. It was annoying. But not as much as
the fact that SIGKILL can't do anything about processes stuck in
"D" state. I really wish I knew a way to get rid of these.

-- 
André Majorel 
bugs.debian.org, a spambot's best friend.



Re: Script vs command line behaviour

2016-10-13 Thread Henrique de Moraes Holschuh
On Thu, 13 Oct 2016, Mark Fletcher wrote:
> but I don't completely understand what you mean. Are you saying that, 
> even though the command to start fetchmail is not an invokation of a 
> systemd unit, the fact that it is happening from inside a script that is 
> run by a systemd unit somehow allows systemd to capture the PID for 
> fetchmail, and that that in turn is having a bearing on my being able to 

Yes.

try "ps xawf -eo pid,user,cgroup:64,args" to see how things are grouped
into "cgroups" by systemd (or by the kernel autogroup facility, or by
cgmanager on systemd-less systems, etc).

Depending on how the systemd unit is configured, systemd will kill
everything belonging to that unit's cgroup when you tell it to stop the
unit.

You can have an unit start processes in a separate cgroup (so they don't
get killed along with the unit), refer to the systemd-run command.


There are other ways to deal with this as well, and they might be more
idiomatic.  I was waiting for someone that groks systemd better to
reply, but since nobody did, here is an workable answer to get you
started :-)

-- 
  Henrique Holschuh



Re: Script vs command line behaviour

2016-10-13 Thread Mark Fletcher
On Thu, Oct 13, 2016 at 04:55:06PM +0200, Nicolas George wrote:
> Le duodi 22 vendémiaire, an CCXXV, Mark Fletcher a écrit :
> > > strace -f -e execve -s 1 -o /tmp/my_script.$$.$(date +%Y%m%d%H%M%S) &
> 
> > But there is no my_script. in /tmp... WhaFu?
> 
> My bad, I forgot the end of the command:
> 
> -p $$
> 
> (insert just before the final ampersand)
> 

I will check this out later because I want to understand what was going 
on, but I am happy to report that switchng to the system-wide fetchmail, 
a unit for which was already in systemd (but which was prevented from 
doing anything by /etc/default/fetchmail, and by the fact it was 
unconfigured) has solved the problem.

So as described in a mail I wrote last night, I've moved my .fetchmailrc 
to /etc/fetchmailrc (which is not an obnoxious thing to do on this 
machine because I am the only habitual user) and enabled this run of 
fetchmail in /etc/default/fetchmailrc, and now systemctl start fetchmail 
and systemctl stop fetchmail start and stop fetchmail meaningfully.

(Also had to change the ownership of /etc/fetchmailrc to the user 
fetchmail, when I copied ~mark/.fetchmailrc to /etc/fetchmailrc as root 
the result was initially owned by root which given the recommended 
permissions of 600 wouldn't have worked)

I moved my refresh interval from the command line to the fetchmailrc 
file -- /etc/default/fetchmailrc provides a means to influence the 
command line but putting all the options in the fetchmailrc felt 
cleaner, now that I think on it.

And then in my original script sudo -u mark fetchmail -q can be replaced 
with systemctl stop fetchmail, and sudo -u mark fetchmail -d 900 can be 
replaced with systemctl start fetchmail

I made all these changes last night and got up this morning to find 
everything working as it should be.

So this issue is now downgraded from an actual issue to a research 
project of why it wasn't working the old way. Which I will still be 
investigating and will post if I find something interesting.

Thanks to all, especially in this case Nicolas, for your help and 
suggestions.

Mark



Re: Script vs command line behaviour

2016-10-13 Thread Nicolas George
Le duodi 22 vendémiaire, an CCXXV, Mark Fletcher a écrit :
> > strace -f -e execve -s 1 -o /tmp/my_script.$$.$(date +%Y%m%d%H%M%S) &

> But there is no my_script. in /tmp... WhaFu?

My bad, I forgot the end of the command:

-p $$

(insert just before the final ampersand)

Command explained:

strace: traces all system calls and signalx of a process;
-f: trace across forks and new processes;
-e execve: only trace the execve system call, i.e. executing a new command;
-s 1: truncate stings with a very high limit;
-o /tmp/...: dump output in a file with an unique name;
-p $$: trace the calling shell;
&: run in background.

Regards,

-- 
  Nicolas George


signature.asc
Description: Digital signature


Re: Script vs command line behaviour

2016-10-13 Thread Mark Fletcher
On Wed, Oct 12, 2016 at 05:17:20PM +0200, Nicolas George wrote:
> You can debug what happens to your script by adding this line near the
> beginning:
> 
> strace -f -e execve -s 1 -o /tmp/my_script.$$.$(date +%Y%m%d%H%M%S) &
> 
> Tomorrow, the file in /tmp will tell you what happened.
> 

OK I added that line near the beginning of my script, after the intial 
comments at the beginning and before it actually does anything.

I also commented out the line that actually invokes Amanda, for testing 
purposes. So I can do all the setup before a backup, and all the 
tear-down afterwards, but not actually do the backup.

I then ran the script by doing systemctl start homebackup (the service 
is called homebackup.service and the timer is called homebackup.timer)

The script definitely ran -- I can tell because one of my Windows VMs is 
running right now and the disklist for Amanda got updated to exclude the 
running VM, the logic to do which is only contained in this script. Both 
VMs were down overnight when he last real backup was performed, so both 
VMs were included in the last backup. Ergo, this run of the script 
changed the amanda disk list. Ergo, it ran.

But there is no my_script. in /tmp... WhaFu? I copied the line 
you put in your email into my script exactly, and then spent a while 
with the man page so I could understand what it was doing, and I think 
it should have worked... but it didn't.

The script takes a couple of seconds to run (when the call to amdump is 
commented out, it would take a good couple of hours otherwise) so 
further evidence it is doing something...

In the meantime, I have had another idea which I suspect will work... 
I'm proposing to dump my user-local configuration of fetchmail, and copy 
my .fetchmailrc to /etc/fetchmailrc (said file does not currently exist 
on my machine), making sure permissions and ownership are appropriate of 
course, and then modify /etc/default/fetchmail to set START_DAEMON=yes 
instead of no, then add a "set daemon 900" line to the copied 
/etc/fetchmailrc, and let the systemwide fetchmail handle my mail 
fetching. This is already set up as a systemd service and runs 
automatically on boot, so I can then modify my script to do systemctl 
stop fetchmail to stop it and systemctl start fetchmail to start it 
again at the end of my script, just like I am doing with mysql.

Given that mysql is being correctly stopped and started by that approach 
already in the script, I have some confidence that fetchmail will too. 

systemctl stop fetchmail and systemctl start fetchmail correctly stop 
and start it from the command line.

I'd like to know if there is something wrong with that strace command, 
or if there is greater sensitivity than I gave it credit for to where it 
is put in the script, but I suspect this approach of using the 
systemwide fetchmail is going to nail it.

Mark



Re: Script vs command line behaviour

2016-10-12 Thread Mark Fletcher
On Wed, Oct 12, 2016 at 04:29:01PM +0200, Nicolas George wrote:
> Le primidi 21 vendémiaire, an CCXXV, Mark Fletcher a écrit :
> > Fetchmail isn't set up as a service through systemd, although mysql and 
> > svnserve are. fetchmail is just started from this script (or supposed to 
> > be!) and launched by hand from the command line when that fails.
> > 
> > So at least systemd isn't complicating the issue.
> 
> Maybe it is. Unlike SysV init and the other legacy tools, systemd keeps
> tracks of the processes it starts, grouping them as "units" using pgroups.
> Your script tries to start fetchmail in background, using the -d option
> (which, by the way, is not present in the man page for the testing version,
> unless I have trouble reading); that would allow it to escape SysV init and
> cron, for example, but not systemd.
> 

I don't know about testing, but in Jessie, the description starts at 
line 1236 of the man page.

Also, I'm sorry, I suspect you may be right at the crux of the issue, 
but I don't completely understand what you mean. Are you saying that, 
even though the command to start fetchmail is not an invokation of a 
systemd unit, the fact that it is happening from inside a script that is 
run by a systemd unit somehow allows systemd to capture the PID for 
fetchmail, and that that in turn is having a bearing on my being able to 
restart it? If so, I don't understand the mechanism at work here, and 
I'm lost as to what to do about it.

My next step is to try your other suggestions re searching for log files 
and increasing logging in the script; will report back on what that 
finds.

Stupid question -- if I comment out the actual backup command from the 
script, can I then run the script at will in the same environment it 
would be kicked off by the timer by using systemctl start ? 
The _timer_ is enabled (linked to multi-user.wants) so it starts at 
boot, would starting the _service_ have the effect of running 
immediately and would that have any nasty side effects? I'm looking for 
a way to not have to wait until the next backup run to test the 
suggestions that have been made.

Mark



Re: Script vs command line behaviour

2016-10-12 Thread Mark Fletcher
On Wed, Oct 12, 2016 at 05:59:10PM +0200, Frank wrote:
> Op 12-10-16 om 17:17 schreef Mark Fletcher:
> >I wonder if passing the --fetchmailrc option will work. The systemd
> >journal snippet I included in my original post shows that fetchmail is
> >getting started successfully -- but by the morning it's not running.
> >Now, clearly nothing gets past me, but that means it's terminating.
> >Which suggests it doesn't know what it is supposed to do once it is
> >started. Which suggests maybe it's not finding the fetchmailrc file?
> 
> Looking at question C6 in the fetchmail FAQ, I'd say that's quite likely...
> :)
> 

Likely though it may have been, I'm afraid it didn't work. I modified 
the last line of the script to: sudo -u mark fetchmail -d 900 
--fetchmailrc /home/mark/.fetchmailrc

Which is the correct location of my .fetchmailrc.

Then sudo journalctl -b | grep fetchmail shows:

Oct 12 23:59:04 kazuki systemd[1]: Starting LSB: init-Script for system 
wide fetchmail daemon... 

Oct 12 23:59:04 kazuki fetchmail[2801]: Not starting fetchmail daemon, 
disabled via /etc/default/fetchmail. 

Oct 12 23:59:04 kazuki systemd[1]: Started LSB: init-Script for system 
wide fetchmail daemon. 

Oct 13 01:30:06 kazuki sudo[4323]: root : TTY=unknown ; PWD=/ ; 
USER=mark ; COMMAND=/usr/bin/fetchmail -q 

Oct 13 01:30:06 kazuki homebackup.sh[4280]: fetchmail: background 
fetchmail at 3734 killed. 

Oct 13 03:48:33 kazuki sudo[5443]: root : TTY=unknown ; PWD=/ ; 
USER=mark ; COMMAND=/usr/bin/fetchmail -d 900 --fetchmailrc 
/home/mark/.fetchmailrc

[lines justified to make them easier to read]

I rebooted yesterday evening, as I was playing around with tails, but 
that is a different story. So the above is the entirety of the output of 
the command.

The first 3 lines are the system-wide fetchmail daemon getting kicked 
off at system boot and deciding not to do anything.

Subsequently, unknown to systemd, I started fetchmail as my unprivelegd 
user mark by hand from the command line using fetchmail -d 900 . That 
invariably works correctly.

In line 4 you can see my backup script stopping fetchmail by running 
fetchmail -q as user mark. Again this command does not involve systemd 
(except for the fact that it is being executed in a script which is 
being executed by a systemd unit)

Line 5 reports the success of that command.

And line 6 _appears_ to be a successful execution of the (modified) last 
line of the script, sudo -u mark fetchmail -d 900 --fetchmailrc 
/home/mark/.fetchmailrc , except once again this morning I got up to 
find fetchmail was not running. And once again running fetchmail -d 900 
from the command line started it successfully.

So I see Nicolas and others suggested other approaches involving greater 
logging of what is going on overnight after I had gone to bed; my next 
step is to try some of those ideas.

Mark



Re: Script vs command line behaviour

2016-10-12 Thread Mark Fletcher
On Wed, Oct 12, 2016 at 04:51:40PM -0400, Jude DaShiell wrote:

> ># and restart the services we stopped
> >systemctl start svnserve
> >systemctl start mysql
> >sudo -u mark fetchmail -d 900
> >
> I think the issue revolves around unknown pwd.  Perhaps running fetchmail as
> the user rather than root will solve that problem.

sudo -u mark fetchmail -d 900   _IS_ running fetchmail as the user, no?

Mark



Re: Script vs command line behaviour

2016-10-12 Thread Jude DaShiell

On Wed, 12 Oct 2016, Mark Fletcher wrote:


Date: Wed, 12 Oct 2016 08:34:22
From: Mark Fletcher 
To: debian-user@lists.debian.org
Subject: Script vs command line behaviour
Resent-Date: Wed, 12 Oct 2016 12:34:43 + (UTC)
Resent-From: debian-user@lists.debian.org

Greetings

I am observing a strange behaviour and I am wondering what stupid thing
I have done that is causing it. A shell command that is supposed to
start fetchmail running every 15 minutes works fine run from the command
line, but has no effect when run from inside a script. I am running
Jessie upgraded many times from an original install from etch, I think
it was. Or squeeze. Whichever of those two it was that came first.

I recently took the plunge and installed Amanda to get my backups
organised. At the moment I am only backing up the one machine but will
expand it later to other machines on my network.

Automating the Amanda backup once I was happy with the configuration
prompted me to write a small bash shell script, reproduced below. It
expects to run as root. It stops the svnserve instance running on my
machine and a mysql instance, and also stops the fetchmail daemon. Then
it starts with a basic amanda config and, if my Windows VMs are not
running, adds their disks and config directories to the stuff that is in
scope for backup. If they are running it refrains from including them.
Then, it kicks off the backup, running as the backup user. Finally, it
restarts the services it stopped.

Then I have created a systemd service to run this script, and set up a
systemd timer to kick off the pocess at 01:30 local time every day.

It runs fine, at the right time, except for one thing. When I get up in
the morning and come to my computer, I find that the backup report is
sitting in my email reporting success, SVNServe is running as it should
be, mysql is running as it should be -- but fetchmail has not been
restarted.

If I do sudo journalctl -b | grep fetchmail and look at this morning's
entries, I see the following:

Oct 12 01:30:02 kazuki sudo[2197]: root : TTY=unknown ; PWD=/ ;
USER=mark ; COMMAND=/usr/bin/fetchmail -q Oct 12 01:30:02 kazuki
homebackup.sh[2154]: fetchmail: background fetchmail at 31717 killed.

Oct 12 04:19:04 kazuki sudo[3582]: root : TTY=unknown ; PWD=/ ;
USER=mark ; COMMAND=/usr/bin/fetchmail -d 900

[lines justified to be easier to read]

The first of those lines indicates the script successfully stopped
fetchmail at 01:30 before starting the backup.

The second would appear to indicate at least that it correctly attempted
to start fetchmail again, as my non-privileged user mark, at 04:19 when
the backup finished. But, when I came to my computer at about 07:30 this
morning, fetchmail was not running and all the mails to this list from
you lovely people were backed up at my email provider waiting to be
downloaded.

I started fetchmail by hand from the command line, and came home this
evening to find it still running sweetly. And it'll continue to do so
until tonight's backup run kills it, at which point I'll get up tomorrow
morning to find it is not running again, if the experience of the last
few days is anything to go by.

If I copy and paste the line that is supposed to restart fetchmail from
the script (reproduced below) to a root shell, with PWD=/, and run the
command, it works, correctly. I am stumped why it is not working from
the script. Anyone see what I have missed? I wondered if it was paths
not being set right, but from the above journal log you can see it
correctly mapped the fetchmail instance to /usr/bin/fetchmail.


/etc/systemd/scripts/homebackup.sh:

#! /bin/bash

# THIS ASSUMES MYSQL AND SVNSERVE ARE RUNNING, AND WE WANT THEM RUNNING
# AGAIN ONCE DONE

# stop a couple of services we don't want running while doing backups
systemctl stop mysql
systemctl stop svnserve

# The systemctl stop for svnserve may not work as I haven't got around to
# making a stop script for it.
# So kill the process the old fashioned way
ps -ef | grep svnserve | grep -v grep | awk '{print $2}' | xargs kill -9

# And kill fetchmail so it doesn't update the mail while we are backing up
sudo -u mark fetchmail -q

# Start with the base disklist
cp /etc/amanda/RealBackup/disklist.stem /etc/amanda/RealBackup/disklist

# If the two VirtualBox VMs are NOT running, add them to the disk list
sudo -u mark vboxmanage showvminfo "TRADER2" | grep -q "running (since" || echo 
localhost /opt/vms/TRADER2 comp-user-tar >> /etc/amanda/RealBackup/disklist
sudo -u mark vboxmanage showvminfo "TRADER2" | grep -q "running (since" || echo "localhost 
"\""/home/mark/VirtualBox VMs/TRADER2"\"" comp-user-tar" >> /etc/amanda/RealBackup/disklist
sudo -u mark vboxmanage showvminfo "TRADER3" | grep -q "running (since" || echo 
localhost /opt/vms/TRADER3 comp-user-tar >> /etc/amanda/RealBackup/disklist
sudo -u mark vboxmanage showvminfo "TRADER3" | grep -q "running (since" || echo "localhost 
"\""/home/mark/VirtualBox VMs/TRADER3"\"" 

Re: Script vs command line behaviour

2016-10-12 Thread Frank

Op 12-10-16 om 17:17 schreef Mark Fletcher:

I wonder if passing the --fetchmailrc option will work. The systemd
journal snippet I included in my original post shows that fetchmail is
getting started successfully -- but by the morning it's not running.
Now, clearly nothing gets past me, but that means it's terminating.
Which suggests it doesn't know what it is supposed to do once it is
started. Which suggests maybe it's not finding the fetchmailrc file?


Looking at question C6 in the fetchmail FAQ, I'd say that's quite 
likely... :)


Regards,
Frank

==
C6. Fetchmail works OK started up manually, but not from an init script.

Often, startup scripts have a different environment than an interactive 
login shell. For instance, $HOME might point to "/root" when you are 
logged in as root, but it might be either unset, or set to "/" when the 
startup scripts are running. That means fetchmail at startup can't find 
the .fetchmailrc.


Pick a location (such as /etc/fetchmailrc) and use fetchmail's -f option 
to point fetchmail at it. That should solve the problem.




Re: Script vs command line behaviour

2016-10-12 Thread Nicolas George
Le duodi 22 vendémiaire, an CCXXV, Mark Fletcher a écrit :
> You bring up a good point, actually. I'm calling systemctl stop and 
> systemctl start to stop and start mysql -- and I'm doing that in a 
> script that is itself being called by a systemd unit (the one triggered 
> by the timer). I wonder if that is in some way naughty and contributing 
> to the problem? Hmmm, is there a better way to ensure these services are 
> stopped before the backup starts and started again afterwards?

This is exactly the correct way of proceeding: with systemctl start, you are
not directly starting the service (like you did with /etc/init.d/something
start, with all the drawbacks it implies, like user environment bleeding
into the service); instead, you are sending a message to systemd to tell it
to start the service.

From your point of view, it does not change anything, but as a design, it is
much cleaner.

> Although that said, MySQL and SVNServe (which are started by systemd) go 
> down and come back up fine, it is the one that ISN'T currently 
> controlled by systemd that I am having problems with.

It IS controlled by systemd, everything is. But you neglected to inform
systemd there was something special about it.

You can debug what happens to your script by adding this line near the
beginning:

strace -f -e execve -s 1 -o /tmp/my_script.$$.$(date +%Y%m%d%H%M%S) &

Tomorrow, the file in /tmp will tell you what happened.

Note that the idea of the stale PID file is worth checking still.

Regards,

-- 
  Nicolas George


signature.asc
Description: Digital signature


Re: Script vs command line behaviour

2016-10-12 Thread Mark Fletcher
On Wed, Oct 12, 2016 at 10:36:40AM -0400, Gene Heskett wrote:
> On Wednesday 12 October 2016 09:40:57 Mark Fletcher wrote:
> 
> > On Wed, Oct 12, 2016 at 08:40:12AM -0400, Greg Wooledge wrote:
> > > On Wed, Oct 12, 2016 at 09:34:22PM +0900, Mark Fletcher wrote:
> > > > # The systemctl stop for svnserve may not work as I haven't got
> > > > around to # making a stop script for it.
> > > > # So kill the process the old fashioned way
> > > > ps -ef | grep svnserve | grep -v grep | awk '{print $2}' | xargs
> > > > kill -9
> > >
> > > Please consider replacing this with some variant of:
> > >
> > > pkill svnserve
> > >
> > > And stop using -9 (SIGKILL).  Forever.  Pretend it never existed.
> >
> > ...Any thoughts on what is preventing the restart of fetchmail from
> > working?
> >
> > Mark
> 
> My best guess is a stale lock file, leftover because you used the brute 
> force kill, so it did not exit gracefully, cleaning up after itself.  My 

Uh no I didn't, not for fetchmail. It was svnserve I brute force killed. 
And svnserve starts again just fine. Greg's led you down the garden path 
by zeroing in on what is evidently a pet peeve of his but actually has 
nothing to do with the problem.

I shut down fetchmail with fetchmail -q which is how the man page says 
to do it.

> 
> # and restore fetchmail but let the disks synch first
> sleep 6
> fetchmail -d 180 --fetchmailrc /home/gene/.fetchmailrc
> 
> This has not failed in many years.

I wonder if passing the --fetchmailrc option will work. The systemd 
journal snippet I included in my original post shows that fetchmail is 
getting started successfully -- but by the morning it's not running. 
Now, clearly nothing gets past me, but that means it's terminating. 
Which suggests it doesn't know what it is supposed to do once it is 
started. Which suggests maybe it's not finding the fetchmailrc file?

I'm going to try specifying where .fetchmailrc is for tonight's run and 
see what happens.

Will report back in the morning. If this nails it, it suggests that 
something is screwy about the environment when run from a systemd script 
(that's only ever been root) compared to running it from a root terminal 
(which was logged in as mark, then su'd to root)

Mark



Re: Script vs command line behaviour

2016-10-12 Thread Darac Marjal

On Thu, Oct 13, 2016 at 12:09:12AM +0900, Mark Fletcher wrote:

On Wed, Oct 12, 2016 at 04:29:01PM +0200, Nicolas George wrote:

Le primidi 21 vendémiaire, an CCXXV, Mark Fletcher a écrit :
> Fetchmail isn't set up as a service through systemd, although mysql and
> svnserve are. fetchmail is just started from this script (or supposed to
> be!) and launched by hand from the command line when that fails.
>
> So at least systemd isn't complicating the issue.

Maybe it is. Unlike SysV init and the other legacy tools, systemd keeps
tracks of the processes it starts, grouping them as "units" using pgroups.
Your script tries to start fetchmail in background, using the -d option
(which, by the way, is not present in the man page for the testing version,
unless I have trouble reading); that would allow it to escape SysV init and
cron, for example, but not systemd.

I do not know the exact rules systemd applies to the processes started by a
timer, but it is entirely possible this is the source of your problem.
Remember when all the systemd haters started shouting "systemd broke screen
and tmux" because the option to clean up the processes in finished user
sessions had been activated by default.


You bring up a good point, actually. I'm calling systemctl stop and
systemctl start to stop and start mysql -- and I'm doing that in a
script that is itself being called by a systemd unit (the one triggered
by the timer). I wonder if that is in some way naughty and contributing
to the problem? Hmmm, is there a better way to ensure these services are
stopped before the backup starts and started again afterwards?


https://www.freedesktop.org/software/systemd/man/systemd.unit.html says

Conflicts=
 A space-separated list of unit names. Configures negative requirement 
 dependencies. If a unit has a Conflicts= setting on another unit, 
 starting the former will stop the latter and vice versa. Note that 
 this setting is independent of and orthogonal to the After= and 
 Before= ordering dependencies.


 If a unit A that conflicts with a unit B is scheduled to be started at 
 the same time as B, the transaction will either fail (in case both are 
 required part of the transaction) or be modified to be fixed (in case 
 one or both jobs are not a required part of the transaction). In the 
 latter case, the job that is not the required will be removed, or in 
 case both are not required, the unit that conflicts will be started 
 and the unit that is conflicted is stopped.


So you should, in theory, be able to add "Conflicts=mysql" to your unit 
and systemd will arrange for it to be stopped before running your unit, 
and started thereafter.




Although that said, MySQL and SVNServe (which are started by systemd) go
down and come back up fine, it is the one that ISN'T currently
controlled by systemd that I am having problems with.

Hmmm, interesting, although I'm not sure quite what to do with that...

Mark



--
For more information, please reread.



Re: Script vs command line behaviour

2016-10-12 Thread Mark Fletcher
On Wed, Oct 12, 2016 at 04:29:01PM +0200, Nicolas George wrote:
> Le primidi 21 vendémiaire, an CCXXV, Mark Fletcher a écrit :
> > Fetchmail isn't set up as a service through systemd, although mysql and 
> > svnserve are. fetchmail is just started from this script (or supposed to 
> > be!) and launched by hand from the command line when that fails.
> > 
> > So at least systemd isn't complicating the issue.
> 
> Maybe it is. Unlike SysV init and the other legacy tools, systemd keeps
> tracks of the processes it starts, grouping them as "units" using pgroups.
> Your script tries to start fetchmail in background, using the -d option
> (which, by the way, is not present in the man page for the testing version,
> unless I have trouble reading); that would allow it to escape SysV init and
> cron, for example, but not systemd.
> 
> I do not know the exact rules systemd applies to the processes started by a
> timer, but it is entirely possible this is the source of your problem.
> Remember when all the systemd haters started shouting "systemd broke screen
> and tmux" because the option to clean up the processes in finished user
> sessions had been activated by default.
> 
You bring up a good point, actually. I'm calling systemctl stop and 
systemctl start to stop and start mysql -- and I'm doing that in a 
script that is itself being called by a systemd unit (the one triggered 
by the timer). I wonder if that is in some way naughty and contributing 
to the problem? Hmmm, is there a better way to ensure these services are 
stopped before the backup starts and started again afterwards?

Although that said, MySQL and SVNServe (which are started by systemd) go 
down and come back up fine, it is the one that ISN'T currently 
controlled by systemd that I am having problems with.

Hmmm, interesting, although I'm not sure quite what to do with that...

Mark



Re: Script vs command line behaviour

2016-10-12 Thread Gene Heskett
On Wednesday 12 October 2016 09:40:57 Mark Fletcher wrote:

> On Wed, Oct 12, 2016 at 08:40:12AM -0400, Greg Wooledge wrote:
> > On Wed, Oct 12, 2016 at 09:34:22PM +0900, Mark Fletcher wrote:
> > > # The systemctl stop for svnserve may not work as I haven't got
> > > around to # making a stop script for it.
> > > # So kill the process the old fashioned way
> > > ps -ef | grep svnserve | grep -v grep | awk '{print $2}' | xargs
> > > kill -9
> >
> > Please consider replacing this with some variant of:
> >
> > pkill svnserve
> >
> > And stop using -9 (SIGKILL).  Forever.  Pretend it never existed.
>
> ...Any thoughts on what is preventing the restart of fetchmail from
> working?
>
> Mark

My best guess is a stale lock file, leftover because you used the brute 
force kill, so it did not exit gracefully, cleaning up after itself.  My 
own scripts restart fetchmail on a nightly basis so fetchmail can't muck 
up an sa-train-bayes run.  It uses "killall fetchmail", then waits 20 
seconds for any mail in the spamd pipes to drain, and when the sa-learn 
bits are completed:

# and restore fetchmail but let the disks synch first
sleep 6
fetchmail -d 180 --fetchmailrc /home/gene/.fetchmailrc

This has not failed in many years.

Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page 



Re: Script vs command line behaviour

2016-10-12 Thread Nicolas George
Le primidi 21 vendémiaire, an CCXXV, Mark Fletcher a écrit :
> Fetchmail isn't set up as a service through systemd, although mysql and 
> svnserve are. fetchmail is just started from this script (or supposed to 
> be!) and launched by hand from the command line when that fails.
> 
> So at least systemd isn't complicating the issue.

Maybe it is. Unlike SysV init and the other legacy tools, systemd keeps
tracks of the processes it starts, grouping them as "units" using pgroups.
Your script tries to start fetchmail in background, using the -d option
(which, by the way, is not present in the man page for the testing version,
unless I have trouble reading); that would allow it to escape SysV init and
cron, for example, but not systemd.

I do not know the exact rules systemd applies to the processes started by a
timer, but it is entirely possible this is the source of your problem.
Remember when all the systemd haters started shouting "systemd broke screen
and tmux" because the option to clean up the processes in finished user
sessions had been activated by default.

Regards,

-- 
  Nicolas George


signature.asc
Description: Digital signature


Re: Script vs command line behaviour

2016-10-12 Thread Mark Fletcher
On Wed, Oct 12, 2016 at 09:56:06AM -0400, Greg Wooledge wrote:
> On Wed, Oct 12, 2016 at 10:40:57PM +0900, Mark Fletcher wrote:
> > ...Any thoughts on what is preventing the restart of fetchmail from 
> > working?
> 
> Nothing in particular.  I haven't used fetchmail in many years, and
> never as a "service" at the system level.  So, just general thoughts:
> 
> 1) Use "systemctl status fetchmail" to see what the operating system
>thinks is happening.  Are the service process(es) still running, or
>did they terminate?  Are there informative messages?  How long did
>they run before terminating, if they did terminate?
> 

Fetchmail isn't set up as a service through systemd, although mysql and 
svnserve are. fetchmail is just started from this script (or supposed to 
be!) and launched by hand from the command line when that fails.

So at least systemd isn't complicating the issue. I'll want to get a 
service wrapped around fetchmail _eventually_ so it starts automatically 
on boot, but I haven't gotten around to that yet. Once I do, then I'll 
have to replace the command in this script with a systemctl start 
command, but I don't want to just cut to that without understanding what 
is going on here.

> 2) Look for fetchmail-specific logs.  If you've defined a logfile location,
>look there.  Otherwise, figure out how fetchmail normally logs, and
>look where it does that.  Maybe it logs through syslog(), in which
>case you'd look for some file in /var/log, unless you've changed
>the syslog configuration to send those somewhere else.  Or maybe it
>has its own default logging location outside of the syslog()
>infrastructure.
> 
> 3) If the current logs are not detailed enough, look for fetchmail-specific
>options to increase logging verbosity.
> 
> 4) Log what your backup script does.  Since it's a shell script, this is
>generally done with something like:
> 
> #!/bin/sh
> exec >> /var/tmp/mylog 2>&1
> echo " New backup started: $(date)"
> set -x
> 
> Adjust to suit your needs.  This isn't *likely* to uncover the problem,
> unless you get something blazingly obvious like:
> 
> Failed to start ftechmail.service: Unit ftechmail.service failed to load: No 
> such file or directory.
> 
> But as I said, these are just general thoughts about how to approach
> this *kind* of problem.  And hey, you never know
> 

Yeah, I think I'll have to try some combination of these ideas. In a 
strange way I'm encouraged actually, I was half-cringing expecting 
someone to go "Mark, you idiot, you've done XYZ stupid thing, duh!"

Mark



Re: Script vs command line behaviour

2016-10-12 Thread Mark Fletcher
On Wed, Oct 12, 2016 at 08:40:12AM -0400, Greg Wooledge wrote:
> On Wed, Oct 12, 2016 at 09:34:22PM +0900, Mark Fletcher wrote:
> > # The systemctl stop for svnserve may not work as I haven't got around to 
> > # making a stop script for it.
> > # So kill the process the old fashioned way
> > ps -ef | grep svnserve | grep -v grep | awk '{print $2}' | xargs kill -9
> 
> Please consider replacing this with some variant of:
> 
> pkill svnserve
> 
> And stop using -9 (SIGKILL).  Forever.  Pretend it never existed.
> 

...Any thoughts on what is preventing the restart of fetchmail from 
working?

Mark



Re: Script vs command line behaviour

2016-10-12 Thread Greg Wooledge
On Wed, Oct 12, 2016 at 10:40:57PM +0900, Mark Fletcher wrote:
> ...Any thoughts on what is preventing the restart of fetchmail from 
> working?

Nothing in particular.  I haven't used fetchmail in many years, and
never as a "service" at the system level.  So, just general thoughts:

1) Use "systemctl status fetchmail" to see what the operating system
   thinks is happening.  Are the service process(es) still running, or
   did they terminate?  Are there informative messages?  How long did
   they run before terminating, if they did terminate?

2) Look for fetchmail-specific logs.  If you've defined a logfile location,
   look there.  Otherwise, figure out how fetchmail normally logs, and
   look where it does that.  Maybe it logs through syslog(), in which
   case you'd look for some file in /var/log, unless you've changed
   the syslog configuration to send those somewhere else.  Or maybe it
   has its own default logging location outside of the syslog()
   infrastructure.

3) If the current logs are not detailed enough, look for fetchmail-specific
   options to increase logging verbosity.

4) Log what your backup script does.  Since it's a shell script, this is
   generally done with something like:

#!/bin/sh
exec >> /var/tmp/mylog 2>&1
echo " New backup started: $(date)"
set -x

Adjust to suit your needs.  This isn't *likely* to uncover the problem,
unless you get something blazingly obvious like:

Failed to start ftechmail.service: Unit ftechmail.service failed to load: No 
such file or directory.

But as I said, these are just general thoughts about how to approach
this *kind* of problem.  And hey, you never know



Re: Script vs command line behaviour

2016-10-12 Thread Frédéric Marchal
On Wednesday 12 October 2016 08:40:12 Greg Wooledge wrote:
> And stop using -9 (SIGKILL).  Forever.  Pretend it never existed.

That's a bit harsh. The tool exists for a good reason :-)

"Unix was not designed to stop you from doing stupid things, because that 
would also stop you from doing clever things."

Doug Gwyn, in Introducing Regular Expressions (2012) by Michael Fitzgerald

And for the curious, here is why kill -9 should only be used as a last resort 
when everything else failed:

http://unix.stackexchange.com/questions/8916/when-should-i-not-kill-9-a-process

Frederic



Re: Script vs command line behaviour

2016-10-12 Thread Greg Wooledge
On Wed, Oct 12, 2016 at 09:34:22PM +0900, Mark Fletcher wrote:
> # The systemctl stop for svnserve may not work as I haven't got around to 
> # making a stop script for it.
> # So kill the process the old fashioned way
> ps -ef | grep svnserve | grep -v grep | awk '{print $2}' | xargs kill -9

Please consider replacing this with some variant of:

pkill svnserve

And stop using -9 (SIGKILL).  Forever.  Pretend it never existed.