bug#54447: cuirass: missing derivation error

2023-10-10 Thread Maxim Cournoyer
Hello,

宋文武  writes:

[...]

> Hello, this one for ddd: https://ci.guix.gnu.org/build/1372655/log/raw
>
>   cannot build missing derivation 
> ?/gnu/store/anzz2p18b7r9x45y350avnk8br2yihi2-ddd-3.4.0.drv?
>
> Restart it on CI still got the same error.

Another example: https://ci.guix.gnu.org/build/1982454/details

--8<---cut here---start->8---
substitute: 
substitute: [Kupdating substitutes from 'http://10.0.0.1'...   0.0%
substitute: [Kupdating substitutes from 'http://10.0.0.1'... 100.0%
cannot build missing derivation 
?/gnu/store/vwhgs9dkj9spryglb180j27dr5vidjxv-ecl-23.9.9.drv?
--8<---cut here---end--->8---

-- 
Thanks,
Maxim





bug#54447: cuirass: missing derivation error

2023-10-10 Thread Maxim Cournoyer
Hi Ludovic,

Ludovic Courtès  writes:

> Hello!
>
> Mathieu Othacehe  skribis:
>
>> A lot of builds, among them ~20 system tests[1], are failing with:
>> "cannot build missing derivation
>> ?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
>> errors.
>
> I have a disappointingly simple hypothesis for this.  Remember that
> “missing derivation” errors happen primarily for system tests.
>
> Turns out that ‘cleanup-cuirass-roots’ in maintenance.git, used as an
> mcron job, explicitly removes GC roots for things like *-os-encrypted
> once they’re more than two days old, as well as GC roots for the
> corresponding .drv.
>
> I think this was increasing the likelihood that a .drv would be GC’d by
> the time we run the test: under high load¹, it’s plausible that a system
> test wouldn’t be built within two days after it’s been queued.
>
> I’m proposing the change below to address this; I don’t think we need
> ‘--gc-keep-outputs --gc-keep-derivations’ anymore now that we keep
> things in ‘guix publish’ cache first and foremost.
>
> Thoughts?

Ah, so that mcron job is kind of a hack to hasten garbage collecting
only *some* items faster than the default policy of 30 days?  And we'd
now avoid deleting selected .drv files while still deleting their
outputs, so in the case something that needs it took more than 2 days to
build, it could lead to having to rebuild the garbage collected outputs?

I'm not sure if we need such a fancy hack with the 100 TiB of data we
now have, but your fix seems reasonable (LGTM!)

> In addition to the mcron job, Cuirass’s own ‘register-gc-roots’
> procedure periodically deletes GC roots older than ‘%gc-roots-ttl’ (30
> days in practice).  That’s okay, except that it would be safer to delete
> GC roots for a .drv if and only if it’s been built already.

Hm.  I wonder if this could explain the other cases we've seen.  It
could be that building a derivation was interrupted or canceled for some
reason, then 30 days elapsed, then was garbage collected, and after
which it doesn't get recreated and we get the error of the missing .drv?

-- 
Thanks,
Maxim





bug#61882: emacs-next-pgtk does not find emacs-org-roam, other path issues

2023-10-10 Thread Maxim Cournoyer
tags 61882 = moreinfo unreproducible
quit

Hi,

Csepp  writes:

> Maxim Cournoyer  writes:
>
>> Hi,
>>
>> Csepp  writes:
>>
>>> Maxim Cournoyer  writes:
>>>
 tags 61882 +notabug
 quit
>>>
>>> I don't think notabug applies until we actually know the root cause.
>>
>> Sadly I don't think there's anything actionable here until you can
>> reproduce the problem and share the recipe with us, so I wanted to close
>> the issue without it being marked as "resolved".
>
> Neither "resolved" nor "notabug" are applicable.  If stalled incident
> reports / issues are a problem, they should probably be marked as
> stalled, or needinfo, for easy filtering.  Marking it as notabug is just
> going to make the job of the next person harder when they search for
> issues related to these symptoms.

I don't think a bug as particular as 'my profile got corrupted' without
any way to recreate it has much value; it's also the first time I've
heard of such a report.  That's why I'd prefer to treat it as an oddity
and close it; if it reproduces (by you or others) let's reopen it, with
fresh and clear information.

> I appreciate all the work going into closing old issues, but I don't
> think chasing a low open issue count should be a goal unto itself
> See https://fvsch.com/stale-bots .

To be clear, I wholly agree.  I've now tagged it as moreinfo and
unreproducible.

-- 
Thanks,
Maxim





bug#65858: mumi crashes

2023-10-10 Thread Maxim Cournoyer
Hi Arun,

Arun Isaac  writes:

> Hi Maxim,
>
> I have made a number of changes to mumi and reconfigured berlin with the
> latest mumi. Here is a quick summary of the main changes to mumi.
>
> - We now log the complete URI and the response code for every request to
>   mumi.
> - We now handle HEAD requests correctly. This should eliminate some of
>   the crashes we saw in the mumi log.

Thanks!  Let's keep an eye on things.

-- 
Thanks,
Maxim





bug#61882: emacs-next-pgtk does not find emacs-org-roam, other path issues

2023-10-10 Thread Csepp


Maxim Cournoyer  writes:

> Hi,
>
> Csepp  writes:
>
>> Maxim Cournoyer  writes:
>>
>>> tags 61882 +notabug
>>> quit
>>
>> I don't think notabug applies until we actually know the root cause.
>
> Sadly I don't think there's anything actionable here until you can
> reproduce the problem and share the recipe with us, so I wanted to close
> the issue without it being marked as "resolved".

Neither "resolved" nor "notabug" are applicable.  If stalled incident
reports / issues are a problem, they should probably be marked as
stalled, or needinfo, for easy filtering.  Marking it as notabug is just
going to make the job of the next person harder when they search for
issues related to these symptoms.

I appreciate all the work going into closing old issues, but I don't
think chasing a low open issue count should be a goal unto itself
See https://fvsch.com/stale-bots .





bug#65858: mumi crashes

2023-10-10 Thread Arun Isaac


Hi Maxim,

I have made a number of changes to mumi and reconfigured berlin with the
latest mumi. Here is a quick summary of the main changes to mumi.

- We now log the complete URI and the response code for every request to
  mumi.
- We now handle HEAD requests correctly. This should eliminate some of
  the crashes we saw in the mumi log.

Regards,
Arun





bug#54447: cuirass: missing derivation error

2023-10-10 Thread Ludovic Courtès
Hello!

Mathieu Othacehe  skribis:

> A lot of builds, among them ~20 system tests[1], are failing with:
> "cannot build missing derivation
> ?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
> errors.

I have a disappointingly simple hypothesis for this.  Remember that
“missing derivation” errors happen primarily for system tests.

Turns out that ‘cleanup-cuirass-roots’ in maintenance.git, used as an
mcron job, explicitly removes GC roots for things like *-os-encrypted
once they’re more than two days old, as well as GC roots for the
corresponding .drv.

I think this was increasing the likelihood that a .drv would be GC’d by
the time we run the test: under high load¹, it’s plausible that a system
test wouldn’t be built within two days after it’s been queued.

I’m proposing the change below to address this; I don’t think we need
‘--gc-keep-outputs --gc-keep-derivations’ anymore now that we keep
things in ‘guix publish’ cache first and foremost.

Thoughts?

In addition to the mcron job, Cuirass’s own ‘register-gc-roots’
procedure periodically deletes GC roots older than ‘%gc-roots-ttl’ (30
days in practice).  That’s okay, except that it would be safer to delete
GC roots for a .drv if and only if it’s been built already.

Thanks,
Ludo’.

¹ The queue was often processed slowly, with many workers remaining idle
  due to the bug fixed by
  
.

diff --git a/hydra/modules/sysadmin/services.scm b/hydra/modules/sysadmin/services.scm
index fecfdde..e6f2b44 100644
--- a/hydra/modules/sysadmin/services.scm
+++ b/hydra/modules/sysadmin/services.scm
@@ -110,9 +110,7 @@
   ((guix config) => ,(make-config.scm)))
#~(begin
(use-modules (ice-9 ftw)
-(srfi srfi-1)
-(guix store)
-(guix derivations))
+(srfi srfi-1))
 
(define %roots-directory
  "/var/guix/profiles/per-user/cuirass/cuirass")
@@ -157,28 +155,6 @@
  deleted))
  deleted))
 
-   (define (root-target root)
- ;; Return the store item ROOT refers to.
- (string-append (%store-prefix) "/" (basename root)))
-
-   (define (derivation-referrers store item)
- ;; Return the referrers of the derivers of ITEM.
- (let* ((derivers  (valid-derivers store item))
-(referrers (append-map (lambda (drv)
- (referrers store drv))
-   derivers)))
-   (delete-duplicates referrers)))
-
-   (define (delete-gc-root-for-derivation drv)
- ;; Delete the GC root for DRV, if any.
- (catch 'system-error
-   (lambda ()
- (let ((item (derivation-path->output-path drv)))
-   (delete-file
-(string-append %roots-directory
-   "/" (basename drv)
-   (const #f)))
-
;; Note: 'scandir' would introduce too much overhead due
;; to the large number of entries that it would sort.
(define deleted
@@ -197,17 +173,7 @@
(for-each (lambda (file)
(display file port)
(newline port))
- deleted)))
-
-   ;; Since we run 'guix-daemon --gc-keep-outputs
-   ;; --gc-keep-derivations', also remove GC roots for the outputs of
-   ;; derivations that refer to the derivers of DELETED.
-   (for-each delete-gc-root-for-derivation
- (with-store store
-   (append-map (lambda (root)
- (derivation-referrers
-  store (root-target root)))
-   deleted
+ deleted
 
 (define (gc-jobs threshold)
   "Return the garbage collection mcron jobs.  The garbage collection
@@ -251,8 +217,7 @@ collection instead."
 
(build-accounts (* build-accounts-to-max-jobs-ratio max-jobs))
(extra-options (list "--max-jobs" (number->string max-jobs)
-"--cores" (number->string cores)
-"--gc-keep-outputs" "--gc-keep-derivations"
+"--cores" (number->string cores)
 
 
 ;;;