bug#67802: Shepherd is not able to run simple networked programs as services
Lars Rustand writes: > Hello, I have created two very simple shepherd services for two > different mail programs (offlineimap and davmail). Both of the programs > are able to run without problem when ran from the commandline, but both > of them fail with networking related errors when I try to run them as > shepherd services. ... > They both seem to fail when opening an *outgoing* socket, but davmail > seems to be able to start *listening* on several ports just fine. So, I figured this out. It had nothing to do with networking even though it looked like it. The problem was that I had cargo cult-copied a #:pid-file parameter from another service, believing that this was just a path were Shepherd could create a pid-file for the service, but Shepherd was in fact expecting the program to create the pid-file. So when the program did not create this pid-file then Shepherd killed the program. So the original bug I reported is in fact not a bug at all and can be closed. However, the error handling in Shepherd could be improved in order to make it more clear what is happening.
bug#67538: Shepherd stops responding during "guix system reconfigure"
> Thank you very much for this, Attila! you're welcome! :) > Are the patch in 67839 and/or your branch "attila" linked from there in a > state that I could test them locally? Would it be valuable to you if I ran a > patched Shepherd and sent logs and/or backtraces as I encountered them? it's nice of you, but not really. now that we have a failing test case in shepherd's unit tests that can reproduce it much easier. with #67839 you would only get you an extra "Assertion failed" message over master, without much useful output. as for my branch, it would emit a lot of useful log, including backtraces, but i keep force-pushing into it. i'm running my servers with it, though, so if you feel really adventurous, and want to join the debugging, then you can try... otherwise it's too much in flux. what we need to focus on now is making shepherd's test suite run clean again, one way or another. then i can test it in a real life environment, and report back with any possible findings. -- • attila lendvai • PGP: 963F 5D5F 45C7 DFCD 0A39 -- “Ignorance might be bliss for the ignorant, but for the rest of us it's a fucking pain in the ass.” — Ricky Gervais
bug#65178: Shepherd stops responding during "guix system reconfigure"
On Fri Dec 15, 2023 at 8:47 PM CET, Attila Lendvai wrote: > i think i have found the root cause of this, as documented here: > https://issues.guix.gnu.org/67839 > > that issue contains patches for shepherd to reproduce it in its test suite. Thank you very much for this, Attila! Are the patch in 67839 and/or your branch "attila" linked from there in a state that I could test them locally? Would it be valuable to you if I ran a patched Shepherd and sent logs and/or backtraces as I encountered them?
bug#65178: Shepherd stops responding during "guix system reconfigure"
i think i have found the root cause of this, as documented here: https://issues.guix.gnu.org/67839 that issue contains patches for shepherd to reproduce it in its test suite. -- • attila lendvai • PGP: 963F 5D5F 45C7 DFCD 0A39 -- “What divides libertarians from everybody else is not a belief about rights or what rights people have, because the judgments libertarians make about the state are the same as the judgments almost everyone makes about private agents. So it's not that we believe in rights that other people don't believe in, or that other people believe in rights that we don't believe in. It's that other people think the state is exempt from the moral principles that apply to non-government agents.” — Michael Huemer
bug#67839: [PATCH 2/2] service: Add two asserts that will make tests/replacement.sh fail.
* modules/shepherd/service.scm (spawn-service-controller): Add two asserts. This is the bug that causes `guix system reconfigure ...` to sometimes hang, and subsequently all shepherd commands, because a match-error flies out from the service-controller of a replaced service, and thus its fiber dies. --- modules/shepherd/service.scm | 2 ++ 1 file changed, 2 insertions(+) diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm index c3bdf44..0ee6929 100644 --- a/modules/shepherd/service.scm +++ b/modules/shepherd/service.scm @@ -382,9 +382,11 @@ denoting what the service provides." (define (spawn-service-controller service) "Return a channel over which @var{service} may be controlled." + (assert (current-process-monitor)) (let ((channel (make-channel))) (spawn-fiber (lambda () + (assert (current-process-monitor)) ;; The controller writes to its current output port via 'local-output'. ;; Make sure that goes to the right port. If the controller got a ;; wrong output port, it could crash and stop responding just because a -- 2.41.0
bug#67839: [PATCH 1/2] shepherd: Move root-service start under with-process-monitor.
* modules/shepherd.scm (main): move the (start-service root-service) under the dynamic extent of with-process-monitor, so that (current-process-monitor) is valid for the root-service, too. --- modules/shepherd.scm | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/modules/shepherd.scm b/modules/shepherd.scm index efc5517..77c6d18 100644 --- a/modules/shepherd.scm +++ b/modules/shepherd.scm @@ -451,12 +451,12 @@ fork in the child process." (run-fibers (lambda () (with-service-registry + (with-process-monitor - ;; Register and start the 'root' service. - (register-services (list root-service)) - (start-service root-service) + ;; Register and start the 'root' service. + (register-services (list root-service)) + (start-service root-service) - (with-process-monitor ;; Replace the default 'system*' binding with one that ;; cooperates instead of blocking on 'waitpid'. Replace ;; 'primitive-load' (in C as of 3.0.9) with one that does -- 2.41.0
bug#67839: shepherd: sometimes hangs on `guix system reconfigure`
my fellow hackers, i'm going to attach two patches that is essentially just adding a couple of asserts that trigger a test failure (tests/replacement.sh). my current codebase (https://codeberg.org/attila-lendvai-patches/shepherd/commits/branch/attila) logs a whole lot more information, and has a more sophisticated error handling. triggering the same error on that codebase shows that the first assert is already failing (the one that is before spawning the new fiber for the controller of the replacement). maybe the root cause is this: https://github.com/wingo/fibers/issues/29 HTH, -- • attila lendvai • PGP: 963F 5D5F 45C7 DFCD 0A39 -- “Angry people want you to see how powerful they are… loving people want you to see how powerful You are.” — Chief Red Eagle
bug#67838: [gnome-team] epiphany pdf.js can't save pdf
Hi Guix, I'm running GNOME Web née epiphany from the gnome-team branch. I've noticed that if I'm viewing a PDF with the built-in pdf.js viewer, pdf.js's download/save button doesn't work. Clicking it doesn't seem to do anything, and, if started from a terminal, no messages are printed there. It did work for me prior to switching to the gnome-team branch. I'm not sure how to troubleshoot further. Best, Jack
bug#67622: libtorrent-rasterbar: Tests stuck forever
Hi Ilya, On Fri, 15 Dec 2023 11:23:30 +0800, Ilya Chernyshov wrote: > > > Hi > > Nothing's changed, the package build still hangs on check phase for ever > > guix build: > > starting phase `check' > Test project /tmp/guix-build-libtorrent-rasterbar-1.2.18.drv-0/build We have libtorrent-rasterbar updated to 1.2.19 for a while[1] and fix was added in that commit. Thanks --- [1]: https://git.savannah.gnu.org/cgit/guix.git/log/?id=fb2bbb0c3e9734006a4108a5c4f9901dbf15edae