Re: Organising Guix Days

2021-12-29 Thread Blake Shaw


Hi folks,

Regarding the presentation I'm putting together on the Guile
Documentation, while I have enough material right now to send out a
basic presentation that covers what I believe to be the dominant,
glaring issues of the composition and organization of the docs as well
as how I would go about restructuring them if everyone agrees with the
proposed edits, I'm also starting to uncover some more nuanced
issues with some of the pedagogical folly that appears throughout the
text, and so a few more weeks would probably result in a more
comprehensive study.

So I was thinking, in that case, should I just make the presentation for
Guix Days? That way it could also be presented at a time when other,
perhaps related projects and issues are coming to the fore.

-- 
“In girum imus nocte et consumimur igni”



Re: On raw strings in commit field

2021-12-29 Thread Mark H Weaver
Hi Liliana,

Liliana Marie Prikler  writes:
> It should be noted, that in the case of moving or deleted tags, the
> assertion Guix "1.2.3" = upstream "v1.2.3" no longer holds.

Agreed, but I don't think that assertion should be our top priority.

For purposes of Guix's core goal of enabling software to be reliably
reproduced in the future, the most important property to preserve is
that 'Guix "1.2.3"' should remain forever immutable.

An obvious corollary is that if upstream mutates the meaning of
'upstream "v1.2.3"' over time, then the equation above will become
false.  That would be an unfortunate result of upstream's actions, but
it's exactly what _needs_ to happen to enable Guix to be reliably
reproducible.

If I perform an experiment with Guix "1.2.3" and publish the results,
and someone later wishes to reproduce those results, they will want
precisely the same 'Guix "1.2.3"' that was used to perform the original
experiment, and not whatever version of the software upstream is now
calling "v1.2.3".

The simple fact is that the way Ricardo wrote the 'guile-aiscm' package
is the right way to ensure that it can be reliably reproduced in the
future.

Guix packages that refer to git _tags_ may cease to be reproducible in
the future if upstream mutates or removes those tags, and it's simply
not feasible to transform our SHA256 hashes (of the NAR-encoded source
checkout) into something that we can use to fetch the archived source
from SWH.  There's simply no hope to make that work, unless we can
convince SWH to maintain a secondary index of their content based on
NAR-encoded source trees, which seems unlikely.

On the other hand, if we refer to git _commit hashes_, then it *is*
feasible for us to fetch the archived source from SWH, regardless of
what upstream has done to its tags in the meantime.

For that reason alone, I think that the way Ricardo wrote the
guile-aiscm package definition is clearly the right approach, given
Guix's longstanding goals.

> On the note
> of fallbacks, we do also have the issue that Guix fails on the first
> download that does not match the hash instead of e.g. continuing to SWH
> to fetch an archive of the old tag (as well as other fallback-related
> issues, also including the "Tricking Peer Review" thread).

That's a bug that can, and should, be fixed.  The existence of that bug
might temporarily prevent us from enjoying the benefits of Ricardo's
approach, but that's not an argument for adopting practices that push us
farther from our core goals.

What do you think?

  Regards,
Mark

-- 
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about .



Re: Formalizing teams

2021-12-29 Thread Lars-Dominik Braun
Hi Ricardo,

> FWIW, mumi also lets you search patches as all contents are indexed:
> 
> https://issues.guix.gnu.org/search?query=%22%28gnu+packages+python-web%29%22+is%3Aopen+tag%3Apatch
thanks, I didn’t think about that. I tried searching
for python-build-system, but not all patches – especially
trivial ones – include enough context (for example
https://issues.guix.gnu.org/52595). Searching for files yields too many
packages, since Python packages are scattered across dozens of files
(e.g. gnu/packages/music.scm).

So unfortunately it doesn’t solve my problem. The only reasonable
thing to do right now seems to be setting up package name based filters,
because commit messages have a format known in advance.

Cheers,
Lars




Re: On raw strings in commit field

2021-12-29 Thread Liliana Marie Prikler
Hi,

Am Mittwoch, dem 29.12.2021 um 09:39 +0100 schrieb zimoun:
> Hi,
> 
> On Tue, 28 Dec 2021 at 21:55, Liliana Marie Prikler
>  wrote:
> 
> > Consider a package being added or updated in Guix.  At the time of
> > commit, we have the tag v1.2.3 pointing towards commit deadbeef.  We
> > therefore create a guix package with version "1.2.3" pointing to
> > said commit (either directly or indirectly).  At this point, one of
> > the following holds:
> >   (1) Guix "1.2.3" -> upstream "v1.2.3" -> upstream "deadbeef"
> >   (2) Guix "1.2.3" -> upstream "deadbeef" <- upstream "v1.2.3"
> > From either, we can follow that Guix "1.2.3" = upstream "v1.2.3".  If
> > upstream keeps their tags around, then both forms are equivalent, but
> > (1) is more convenient; it allows us to derive commit from version,
> > which is often done through an affine mapping.
> 
> No, tags and hash commit are not equivalent.  Hash commit is intrinsic:
> it only depends on the content.  Whereas, tags are extrinsic, they
> depend on external choice.
The notion of equivalence I am using here is the same as in the
statement "5 ≡ 2 mod 3", wherein the ≡ symbol is ironically called
IDENTICAL TO in Unicode despite being used very differently in
mathematics.  Perhaps there is a language barrier here; in German we
read that as "5 is equivalent to 2 modulo 3" and logic equivalence
functions similarly.

For the record, one could argue that I should have used that symbol for
comparing Guix "1.2.3" to upstream "v1.2.3" because they are in fact
not equal, only equivalent, but that's besides the point.  The point
is, with an upstream behaving as we want upstreams to behave (not just
git ones, url-fetch suffers from the same issue with moving tarballs
for instance), you can substitute one for the other without a change in
meaning; both will fetch the same commit.

> From the content to the hash, three keys: 1) how to serialize and 2)
> how to hash and 3) how to represent the hash.  For #1, Git uses their
> own serializer and Guix, inheriting from Nix, uses another (Nar);
> although the difference is minor.  For #2, Git uses by default SHA-1 as
> hash function, although Guix uses SHA-256.  And for #3, Git uses
> hexadecimal format and Guix uses nix-base32.
> 
> The subcommand “guix hash” with the options ’-S, -H’ and ’-f’ exposes
> these 3 keys.  For instance:
> 
>     $ cat /tmp/foo.txt | git hash-object --stdin
>     557db03de997c86a4a028e1ebd3a1ceb225be238
>     $ ./pre-inst-env guix hash -S git -H sha1 -f hex /tmp/foo.txt
>     557db03de997c86a4a028e1ebd3a1ceb225be238
> 
> To make it explicit, the checksum hash of ’git-reference’ could be
> removed because it is somehow redundant with the commit hash.
> Obviously, it cannot because security reason (SHA-1 is considered as
> weak).
The other way also works.  If Git used a secure hashing function such
as SHA-256 (or SHA-512 or Keccak) and Guix supported that hash, we
could generate a git hash from the Guix hash (assuming also we allow
the origin serializer to be configured, which would be required either
way).

The weakness of SHA-1 also flies in the face of the robustness
argument.  One could maliciously push a commit that replaces an
existing one with the same hash, though it would also break the repo in
doing so.  At least in theory, as no such attack has been done yet. 
Note to self: theoretical attacks on Git are probably off-topic as
well.

> > Problems arise, when upstreams move or delete tags.  At this 
> > point, guix packages that use them break and are no longer able to
> > fetch their source code.  Raw commits are in principle resilient to
> > this kind of denial of service; instead upstreams would have to
> > actually delete the commits themselves, including also possible
> > backups such as SWH to break it.  There is certainly an argument
> > for robustness to be made here, particularly concerning `guix time-
> > machine', though as noted it is not infallible.  
> 
> SWH provides ’swh:id’ which is another triplet (really close to Git).
> Basically, content means data and metadata and to make it short, SWH
> deals their way with metadata for reason of large scale.  And SWH
> does snapshots of Git repositories.
> 
> Therefore, to have something really robust, Guix has to rely on a map
> from package definition to SWH.
> 
> Using Git commit hash instead of tag makes this map.  For tag, to
> have something robust, we need an external map from checksum hash to
> SWH hash via Git commit hash.  This “external” is done by Disarchive.
I don't know too much about Disarchive here, so please enlighten me. 
If it used a pair of origin file name + hash, whether or not the git-
reference uses tags would be irrelevant, no?  Do we have to take values
from the uri field?

> > Long-term, we might want to support having multiple  > references> in git-fetch -- if the first one fails due to a hash
> > mismatch, we would warn about that instead of producing an error
> > and thereafter continue 

Re: guix system reconfigure after more than a year

2021-12-29 Thread Blake Shaw
right on, i've had similar experiences where what was before dreaded
eases into comfortable relief. great work everyone!
-- 
“In girum imus nocte et consumimur igni”



Re: Formalizing teams

2021-12-29 Thread Efraim Flashner
On Tue, Dec 28, 2021 at 03:44:55PM +0100, Ricardo Wurmus wrote:
> 
> Maxim Cournoyer  writes:
> 
> >> Guix is nowhere near the size of the Rust community (yet!), but I can
> >> already picture teams and members:
> >>
> >>   co-maintainers (“core team”)
> >>   community
> >>   infrastructure
> >>   internationalization
> >>   security response
> >>   release
> >>   Rust packaging
> >>   R packaging
> >>   Java packaging
> >
> > We'd have to include every language/system of importance to that list
> > (Python, Ruby, Emacs, LaTeX, Perl, etc.), no?
> 
> No, only those where we already have the people who could form a team.
> There is no need for any of this to be comprehensive.  It just needs to
> be an improvement over the status quo.
> 
> FWIW, I’ll gladly make it official that I could be the person to talk to
> when it comes to “R packaging”.  This is already the case, but only
> those people know it who don’t really need to know this.
> 
> Advertising this kind of information or recording it somewhere where our
> tools could redirect incoming requests would be an improvement.
> 
> > Are our problems really organizational?  I think before attempting to
> > come up with a solution, we must analyze and agree on what it is that
> > needs improvement to help us move forward more efficiently.
> 
> I do think it’s a lack of organization, yes.  Today I’m no longer
> following guix-commits, guix-patches, or bug-guix, and I’m overwhelmed
> by guix-devel and help-guix.  Whenever something catches my attention
> I’ll read a bit and maybe reply.  But by far the best way to get my
> attention for a review is to ask on #guix or #guix-hpc or to
> X-Debbugs-Cc (or Cc) me on emails.
> 
> Having some topic-specific streams I could tap into would allow me to be
> a little more proactive.
> 

Echoing Rekardo, I still check each commit as I git pull but I'm not
able to keep up with all of guix-devel, guix-help and bugs-guix anymore.
I currently have about 3000 unread emails in my combo
guix-devel/bugs-guix folder that I keep as a todo list of sorts of
patches to check or bugs to follow-up on. I feel overwhelmed by the
sheer number of emails and I feel guilty about not reviewing more
patches and bugs.

-- 
Efraim Flashner  רנשלפ םירפא
GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted


signature.asc
Description: PGP signature


Re: git hook error

2021-12-29 Thread zimoun
Hi,

On Tue, 28 Dec 2021 at 18:46, Leo Famulari  wrote:
> On Tue, Dec 28, 2021 at 11:31:10PM +0100, Ricardo Wurmus wrote:
>> The motivation for that is not found in just one big problem.  It’s a
>> small trickle of minor annoyances:
>> 
>> - Savannah’s uptime isn’t quite as high as we’d like
>
> Okay. I wonder if we could actually do a better job, or if anybody who
> hosts a comparable repo does.
>
> Our own record with the build farm and the record of major hosts like
> Github are both somewhat discouraging. And if we could only hope for an
> equivalent uptime to Savannah, it doesn't seem worth it to shoulder this
> work ourselves.

Well, I agree with Ricardo that the list of minor annoyances is greater
than the burden of maintenance.  I also agree that it is again another
load on “our” shoulders, but somehow at this point, is it that much
compared to the gain?


> My opinion is that, in order to consider hosting our own Git server, we
> should wait until people are using declarative Guix configuration to
> operate reliable, performant, and public Git servers that would meet our
> needs. That is, the Guix project needs to grow this capability without
> the heroic effort of a single volunteer. Because that's what we have now
> with Savannah, more or less, and we don't have to work for it. Maybe
> this has already been achieved, I don't know.

Hum, maybe I miss something but changing the Git server is transparent
for the users.  One commit pushed to Savannah changing the URL, the user
runs “guix pull”, then the next “guix pull” will pull the new location.

However, it is not clear what happens for “guix time-machine”?

Well, as it is possible to list several substitutes servers, maybe the
first step could to add another Git server in addition to Savannah,
mirror this Git server using Savannah, then flip and mirror Savannah
against this Git server.

Last, this server should be part of a CDN, IMHO.

Cheers,
simon



Re: On raw strings in commit field

2021-12-29 Thread zimoun
Hi,

On Tue, 28 Dec 2021 at 21:55, Liliana Marie Prikler  
wrote:

> Consider a package being added or updated in Guix.  At the time of
> commit, we have the tag v1.2.3 pointing towards commit deadbeef.  We
> therefore create a guix package with version "1.2.3" pointing to said
> commit (either directly or indirectly).  At this point, one of the
> following holds:
>   (1) Guix "1.2.3" -> upstream "v1.2.3" -> upstream "deadbeef"
>   (2) Guix "1.2.3" -> upstream "deadbeef" <- upstream "v1.2.3"
> From either, we can follow that Guix "1.2.3" = upstream "v1.2.3".  If
> upstream keeps their tags around, then both forms are equivalent, but
> (1) is more convenient; it allows us to derive commit from version,
> which is often done through an affine mapping.

No, tags and hash commit are not equivalent.  Hash commit is intrinsic:
it only depends on the content.  Whereas, tags are extrinsic, they
depend on external choice.

>From the content to the hash, three keys: 1) how to serialize and 2) how
to hash and 3) how to represent the hash.  For #1, Git uses their own
serializer and Guix, inheriting from Nix, uses another (Nar); although
the difference is minor.  For #2, Git uses by default SHA-1 as hash
function, although Guix uses SHA-256.  And for #3, Git uses hexadecimal
format and Guix uses nix-base32.

The subcommand “guix hash” with the options ’-S, -H’ and ’-f’ exposes
these 3 keys.  For instance:

$ cat /tmp/foo.txt | git hash-object --stdin
557db03de997c86a4a028e1ebd3a1ceb225be238
$ ./pre-inst-env guix hash -S git -H sha1 -f hex /tmp/foo.txt
557db03de997c86a4a028e1ebd3a1ceb225be238


To make it explicit, the checksum hash of ’git-reference’ could be
removed because it is somehow redundant with the commit hash.
Obviously, it cannot because security reason (SHA-1 is considered as
weak).


> Problems arise, when upstreams move or delete tags.  At this point,
> guix packages that use them break and are no longer able to fetch their
> source code.  Raw commits are in principle resilient to this kind of
> denial of service; instead upstreams would have to actually delete the
> commits themselves, including also possible backups such as SWH to
> break it.  There is certainly an argument for robustness to be made
> here, particularly concerning `guix time-machine', though as noted it
> is not infallible.  

SWH provides ’swh:id’ which is another triplet (really close to Git).
Basically, content means data and metadata and to make it short, SWH
deals their way with metadata for reason of large scale.  And SWH does
snapshots of Git repositories.

Therefore, to have something really robust, Guix has to rely on a map
from package definition to SWH.

Using Git commit hash instead of tag makes this map.  For tag, to have
something robust, we need an external map from checksum hash to SWH hash
via Git commit hash.  This “external” is done by Disarchive.


> Long-term, we might want to support having multiple  in
> git-fetch -- if the first one fails due to a hash mismatch, we would
> warn about that instead of producing an error and thereafter continue
> with the second, third, etc. similar to how we currently have mirror://
> urls for some well-known mirrored repositories.  That way, we have a
> system to warn us about naughty upstreams while also providing
> robustness for the time machine.

I think the long term is to completely remove tag and only use commit
hash; as done for ’guile-aiscm’.  But it will not happen for convenience
reasons, I guess.

What you are proposing is to mix extrinsic (tag, URL, etc.) with
intrinsic (commit hash, checksum hash, etc.).  Well, I do not know if
this proposed fallback mechanism would ease the maintenance and would
make Guix more robust.

To me, robustness means make a map from intrinsic values to content; as
Disarchive is doing for instance.


Cheers,
simon