bug#64008: shepherd respawns a service even when it's disabled

2023-06-11 Thread Attila Lendvai
i forgot to mention that the service is in the stopped state while this is 
happening, at least according to `herd status`.

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“Most of our lives, most of us live in realities determined by others, 
imprinted in our brains by education, by religion, by politics, by the 
authorities.”
— Timothy Leary (1920–1996)





bug#63989: [PATCH] guix.texi typo

2023-06-11 Thread Ruijie Yu via Bug reports for GNU Guix
Hello,

I may have taken a wrong approach in sending a patch directly to
guix-patc...@gnu.org without posting it here as well, since I noticed
that almost two weeks have passed without responses, whereas huge
traffic of patches and issues were dealt with along the way.

If that mail was indeed overlooked or forgotten, I hope sending a
message to this address will let someone look at that one-line patch I
sent.  Thanks.

https://issues.guix.gnu.org/63753
https://lists.gnu.org/archive/html/guix-patches/2023-06/msg00146.html
msgid:20230604030158.4086935-2-rui...@netyu.xyz

-- 
Best,


RY





bug#64006: Installation frustration weekend

2023-06-11 Thread Denys Nykula
Guix on a Debian laptop host works okay, GuixSD in VirtualBox a couple
of weeks ago seemed fine too, but I've just spent a weekend trying to
install it on a dedicated laptop for real use, and I'm running into
many problems.

I should split this and file multiple more detailed issues later over
the next couple of weeks, but in case I don't have a chance to get back
to it soon I thought posting it here as is would be better than never
posting it anywhere.

Installation with a desktop environment (EXWM) started to "die
unexpectedly" every time when loading subtitutes, please see installer-
dump-790b17b6.

(Nice thing) Didn't find a netcat tool in the installer image to copy
the dump to my tethering phone, but bash's feature enabling me to cat
to >/dev/tcp/192.168.x.x/PORT did what I want. So if the upload failed,
I can post the tarball manually if needed.

Nothing hints that to enter the encryption key when it asks it the
second time during the boot, the user has to switch their keyboard
layout. Guessed it accidentally by playing with the Guile REPL that
appears after three decryption attempts.

Console fonts lose Cyrillic support after the first pull and upgrade.
Question marks appear in place of my letters instead.

(Probably impossible to fix by design, but worth mentioning) Pulling
and downloading substitutes wastes hours of time and traffic, without
estimates and with the warning split into many small messages without a
total sum. Also seems like it happens from scratch after the boot,
despite the installer having already pulled so much from the Internet.

That's with the standard installer image 1.4.0 x86_64. Retried using
the latest image. It partially fixes one of the problems (desktop now
installs), and introduces more.

Nothing hints that the combination for layout switching is now
Shift+Alt, not Alt+Shift.

Video (Intel HD Graphics 400) freezes after gdm shows the mouse, so I
can't use the environment or switch to a console to get a debug log.





bug#64008: shepherd respawns a service even when it's disabled

2023-06-11 Thread Attila Lendvai
the issue:

i'm in a situation where my service quits after a few seconds of CPU usage 
(i.e. the default-respawn-limit is not triggered). therefore shepherd keeps 
restarting it, practically in a busy loop.

suggested solution:

maybe respawn-service should check for the disabled state, so that the admin 
can intervene by `herd disable myservice`.

a longer term solution could be to add a respawn-delay field for , and 
default it to something non-zero.

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
The mind: an excellent servant, but a dangerous master.






bug#63675: shepherd 0.10.0 test 2 fail on riscv64-linux

2023-06-11 Thread Z572 via Bug reports for GNU Guix

on qemu:
 forking-service.sh test always fail.


try use strace but can't:

```
+ /gnu/store/w7a3fxw00y4picvcrkvdxavpj5gqabbb-strace-6.2/bin/strace -f -t -s 80 
-o /tmp/she.strace shepherd -I -s t-socket-25862 -c t-conf-25862 -l t-log-25862 
--pid=t-pid-25862
+ sleep 0.3
/gnu/store/w7a3fxw00y4picvcrkvdxavpj5gqabbb-strace-6.2/bin/strace: 
test_ptrace_get_syscall_info: PTRACE_TRACEME: Function not implemented
/gnu/store/w7a3fxw00y4picvcrkvdxavpj5gqabbb-strace-6.2/bin/strace: 
ptrace(PTRACE_TRACEME, ...): Function not implemented
/gnu/store/w7a3fxw00y4picvcrkvdxavpj5gqabbb-strace-6.2/bin/strace: 
PTRACE_SETOPTIONS: Function not implemented
/gnu/store/w7a3fxw00y4picvcrkvdxavpj5gqabbb-strace-6.2/bin/strace: detach: 
waitpid(25922): No child processes
```

pid-file.sh fail in check phase, but success at rerun use 'make check 
TESTS=tests/pid-file.sh'


riscv machine temporarily unable to reproduce. 


+ shepherd --version
shepherd (GNU Shepherd) 0.10.1
Copyright (C) 2023 the Shepherd authors
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
+ herd --version
herd (GNU Shepherd) 0.10.1
Copyright (C) 2023 the Shepherd authors
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
+ socket=t-socket-3937
+ conf=t-conf-3937
+ log=t-log-3937
+ pid=t-pid-3937
+ service_pid=t-service-pid-3937
+ herd='herd -s t-socket-3937'
+ trap 'cat t-log-3937 || true; rm -f t-socket-3937 t-conf-3937 
t-service-pid-3937 t-log-3937;
  test -f t-pid-3937 && kill `cat t-pid-3937` || true; rm -f t-pid-3937' 
EXIT
+ cat
+ rm -f t-pid-3937
+ test -f t-pid-3937
+ sleep 0.3
+ shepherd -I -s t-socket-3937 -c t-conf-3937 -l t-log-3937 --pid=t-pid-3937
+ test -f t-pid-3937
+ sleep 0.3
+ test -f t-pid-3937
+ sleep 0.3
+ test -f t-pid-3937
+ sleep 0.3
+ test -f t-pid-3937
+ sleep 0.3
Starting service root...
Service root started.
Service root running with value #t.
Service root has been started.
+ test -f t-pid-3937
+ sleep 0.3
Starting service test-works...
+ test -f t-pid-3937
+ sleep 0.3
+ test -f t-pid-3937
+ sleep 0.3
+ test -f t-pid-3937
+ sleep 0.3
+ test -f t-pid-3937
+ sleep 0.3
+ test -f t-pid-3937
+ sleep 0.3
+ test -f t-pid-3937
+ sleep 0.3
Service test-works has been started.
Service test-works started.
Service test-works running with value 5371.
+ test -f t-pid-3937
++ cat t-pid-3937
+ shepherd_pid=4685
+ grep running
+ herd -s t-socket-3937 status test-works
  It is running since 14:39:18 (0 seconds ago).
+ test -f t-service-pid-3937
++ cat t-service-pid-3937
+ kill -0 5371
+ herd -s t-socket-3937 stop test-works
Stopping service test-works...
Service test-works stopped.
Service test-works is now stopped.
+ rm t-service-pid-3937
+ herd -s t-socket-3937 start test
Service test could not be started.
herd: error: failed to start service test
+ true
+ herd -s t-socket-3937 status test
+ grep stopped
  It is stopped (failing).
+ test -f t-service-pid-3937
++ cat t-service-pid-3937
+ kill -0 7771
./tests/pid-file.sh: line 127: kill: (7771) - No such process
+ true
+ rm -f t-service-pid-3937
+ herd -s t-socket-3937 start test-daemonizes
Service test-daemonizes could not be started.
herd: error: failed to start service test-daemonizes
+ true
+ herd -s t-socket-3937 status test-daemonizes
+ grep stopped
  It is stopped (failing).
+ test -f t-service-pid-3937
++ cat t-service-pid-3937
+ kill -0 12006
+ false
+ cat t-log-3937
2023-06-11 14:39:15 Starting service root...
2023-06-11 14:39:15 Service root started.
2023-06-11 14:39:15 Service root running with value #t.
2023-06-11 14:39:15 Service root has been started.
2023-06-11 14:39:16 Starting service test-works...
2023-06-11 14:39:18 Service test-works has been started.
2023-06-11 14:39:18 Service test-works started.
2023-06-11 14:39:18 Service test-works running with value 5371.
2023-06-11 14:39:19 Stopping service test-works...
2023-06-11 14:39:19 Service test-works stopped.
2023-06-11 14:39:19 Service test-works is now stopped.
2023-06-11 14:39:19 Starting service test...
2023-06-11 14:39:25 Service test could not be started.
2023-06-11 14:39:25 Service test failed to start.
2023-06-11 14:39:26 Starting service test-daemonizes...
2023-06-11 14:39:32 Service test-daemonizes could not be started.
2023-06-11 14:39:32 Service test-daemonizes failed to start.
+ rm -f t-socket-3937 t-conf-3937 t-service-pid-3937 t-log-3937
+ test -f t-pid-3937
++ cat t-pid-3937
+ kill 4685
+ rm -f t-pid-3937
Stopping service root...
Exiting shepherd...
Exiting.

Some deprecated features have been used.  Set the environment
variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
program to get more information.  Set it to "no" to suppress
this message.
FAIL tests/pid-file.sh (exit status: 1)
+ shepherd --version

bug#63868: [reconfigure, shepherd] error: remove: unbound variable

2023-06-11 Thread Ludovic Courtès
Hi,

Attila Lendvai  skribis:

> services: Error in MODIFY-SERVICES when services don't exist
> https://git.savannah.gnu.org/cgit/guix.git/commit/?id=dbbc7e946131ba257728f1d05b96c4339b7ee88b
>
> one commit prior in the history works fine: 
> ae707b62e71b1fae054eb422412384bcc8d39fa9

There were problems with that commit that have been mostly fixed now:

  https://issues.guix.gnu.org/63921

How this could explain what you describe, I don’t know, not having a
clear picture of what happened.

I’m closing but please reopen with additional info if you think there’s
still something wrong.

Thanks,
Ludo’.





bug#58485: [shepherd] EADDRINUSE while restarting ssh-daemon

2023-06-11 Thread Ludovic Courtès
Hi Lars,

Ludovic Courtès  skribis:

> Ludovic Courtès  skribis:
>
>>> 1 14:12:15.117035 read(21, "(shepherd-command (version 0) (action 
>>> restart) (service ssh-daemon) (arguments ()) (directory \"/root\"))", 1024) 
>>> = 103
>>> 1 14:12:15.117254 close(27) = 0
>>> 1 14:12:15.117283 close(30) = 0
>>> 1 14:12:15.117416 newfstatat(AT_FDCWD, "/etc/localtime", 
>>> {st_dev=makedev(0x8, 0x2), st_ino=110100491, st_mode=S_IFREG|0444, 
>>> st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, s
>>> t_size=2298, st_atime=1676898665 /* 2023-02-20T14:11:05.338746772+0100 */, 
>>> st_atime_nsec=338746772, st_mtime=1676898664 /* 
>>> 2023-02-20T14:11:04.874743456+0100 */, st_mtime_nsec=874743456, st_c
>>> time=1676898664 /* 2023-02-20T14:11:04.874743456+0100 */, 
>>> st_ctime_nsec=874743456}, 0) = 0
>>> 1 14:12:15.117475 write(17, "shepherd[1]: Service ssh-daemon has been 
>>> stopped.\n", 50) = 50
>>> 1 14:12:15.117524 socket(AF_INET, 
>>> SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 26
>>> 1 14:12:15.117561 setsockopt(26, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>>> 1 14:12:15.117598 bind(26, {sa_family=AF_INET, sin_port=htons(), 
>>> sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EADDRINUSE (Address already in use)
>>> 1 14:12:15.117724 write(21, "(reply (version 0) (result #f) (error 
>>> (error (version 0) action-exception start ssh-daemon system-error (\"bind\" 
>>> \"~A\" (\"Address already in use\") (98 (messages (\"Service ssh-daemon 
>>> has been stopped.\")))", 204) = 204
>>> 1 14:12:15.117754 close(21) = 0
>
> [...]
>
>> Maybe we should just have shepherd retry upon EADDRINUSE (like nginx
>> does, as you wrote), though I’d like to understand under what conditions
>> we can get EADDRINUSE in the first place.
>
> Done:
>
>   
> https://git.savannah.gnu.org/cgit/shepherd.git/commit/?id=41789ee8d0e164967f9ca196db4e9601400a462e

I’m assuming that this is fixed in Shepherd 0.10.x.  Please reopen if
you stumble upon this issue again.

Ludo’.





bug#53580: /var/run/shepherd/socket is missing on an otherwise functional system

2023-06-11 Thread Ludovic Courtès
Attila Lendvai  skribis:

> (define (call-with-server-socket file-name proc)
>   "Call PROC, passing it a listening socket at FILE-NAME and deleting the
> socket file at FILE-NAME upon exit of PROC.  Return the values of PROC."
>   (let ((sock (open-server-socket file-name)))
> (dynamic-wind
>   noop
>   (lambda () (proc sock))
>   (lambda ()
> (close sock)
> (catch-system-error (delete-file file-name))

For the record, ‘dynamic-wind’ here was replaced by ‘catch’ in
46790f9d924af2a9521adccb9e6db6afd9c1a2e7, which corresponds to the
introduction of Fibers in 0.9.x.

Ludo’.





bug#53580: /var/run/shepherd/socket is missing on an otherwise functional system

2023-06-11 Thread Ludovic Courtès
Hi,

Attila Lendvai  skribis:

> when i'm working on my service code, which is `guix pull`ed in from my 
> channel, then after a reconfigure i seem to have to reboot for my new code to 
> get activated. a simple `herd restart` on the service didn't seem to be 
> enough. i.e. the guile modules that my service code is using did not get 
> reloaded into the PID 1 guile.

Guile modules do not get reloaded; there’s no mechanism in place to
reload previously-loaded Guile modules.

> keep in mind that this is a non-trivial service that e.g. spawns a long-lived 
> fiber to talk to the daemon through its stdio while the daemon is running. 
> IOW, its start GEXP is not just a simple forkexec, but something more complex 
> that uses functions from guile modules that should be reloaded into PID 1 
> when the new version of the service is to be started.

OK, got it.  There’s not enough info here to be concrete, but I’d
recommend making it a separate process if you need to reliably
reload/replace the module.  IOW, you’d make it a “regular” service
spawned with ‘make-forkexec-constructor’ or similar.

However this doesn’t have anything to do with the initial bug report and
the title of this message; for clarity, please move further discussion
to guix-devel.

Thanks,
Ludo’.