Re: PyTorch with ROCm

2024-03-31 Thread David Elsing
Hi!

Ludovic Courtès  writes:

> I’m happy to merge your changes in the ‘guix-hpc’ channel for the time
> being (I can create you an account there if you wish so you can create
> merge requests etc.).  Let me know!

Ok sure, that sounds good! I made the packages only for ROCm 6.0.2 so
far though.

> I agree with Ricardo that this should be merged into Guix proper
> eventually.  This is still in flux and we’d need to check what Kjetil
> and Thomas at AMD think, in particular wrt. versions, so no ETA so far.

Yes I agree, the ROCm packages are not ready to be merged yet.

> Is PyTorch able to build code for several GPU architectures and pick the
> right one at run time?  If it does, that would seem like the better
> option for me, unless that is indeed so computationally expensive that
> it’s not affordable.

It is the same as for other HIP/ROCm libraries, so the GPU architectures
chosen at build time are all available at runtime and automatically
picked. For reference, the Arch Linux package for PyTorch [1] enables 12
architectures. I think the architectures which can be chosen at compile
time also depend on the ROCm version.

>> I'm not sure they can be combined however, as the GPU code is included
>> in the shared libraries. Thus all dependent packages like
>> python-pytorch-rocm would need to be built for each architecture as
>> well, which is a large duplication for the non-GPU parts.
>
> Yeah, but maybe that’s OK if we keep the number of supported GPU
> architectures to a minimum?

If it's no issue for the build farm it would probably be good to include
a set of default architectures (the officially supported ones?) like you
suggested, and make it easy to recompile all dependent packages for
other architectures. Maybe this can be done with a package
transformation like for '--tune'?. IIRC, building composable-kernel for
the default architectures with 16 threads exceeded 32 GB of memory
before I cancelled the build and set it to only architecture.

>> - Many tests assume a GPU to be present, so they need to be disabled.
>
> Yes.  I/we’d like to eventually support that.  (There’d need to be some
> annotation in derivations or packages specifying what hardware is
> required, and ‘cuirass remote-worker’, ‘guix offload’, etc. would need
> to honor that.)

That sounds like a good idea, could this also include CPU ISA
extensions, such as AVX2 and AVX-512?

>> - For several packages (e.g. rocfft), I had to disable the
>>   validate-runpath? phase, as there was an error when reading ELF
>>   files. It is however possible that I also disabled it for packages
>>   where it was not necessary, but it was the case for rocblas at
>>   least. Here, kernels generated are contained in ELF files, which are
>>   detected by elf-file? in guix/build/utils.scm, but rejected by
>>   has-elf-header? in guix/elf.scm, which leads to an error.
>
> Weird.  We’d need to look more closely into the errors you got.

I think the issue is simply that elf-file? just checks the magic bytes
and has-elf-header? checks for the entire header. If the former returns
#t and the latter #f, an error is raised by parse-elf in guix/elf.scm.
It seems some ROCm (or tensile?) ELF files have another header format.

> Oh, just noticed your patch bring a lot of things beyond PyTorch itself!
> I think there’s some overlap with
> , we
> should synchronize.
Ah, I did not see this before, the overlap seems to be tensile,
roctracer and rocblas. For rocblas, I saw that they set
"-DAMDGPU_TARGETS=gfx1030;gfx90a", probably for testing?

Thank you!
David

[1] 
https://gitlab.archlinux.org/archlinux/packaging/packages/python-pytorch/-/blob/ae90c1e8bdb99af458ca0a545c5736950a747690/PKGBUILD



Coordinators for patch review session on Tuesday

2024-03-31 Thread Steve George
Hi all,

The next patch-review session is taking place on Tuesday 2nd of March [0] and 
I'd love to try pair-programming where groups can actively work on some patch 
reviews.

Is anyone willing to 'co-ordinate' a pair programming session?

Last time I set-up a cloud host and installed Upterm (https://upterm.dev/) so 
that everyone could ssh into a session. We could run 4-5 simultaneous sessions 
where people could 'pair' to do patch reviews together. 

To co-ordinate a session I'll give you SSH access and there are instructions on 
how to launch the Upterm session. We have written instructions on some basic 
tools to do the patch reviews - and as Guix is installed for each user you can 
add your own ;-)

Anyone up for it?

I'm also collecting simple patches for review onto a list so they can be 
assigned to each group:

https://debbugs.gnu.org/cgi-bin/pkgreport.cgi?tag=patch-review-hackers-list;users=guix

Feel free to add simple issues there!

Thanks,

Futurile / Steve

[0] 
https://libreplanet.org/wiki/Group:Guix/PatchReviewSessions2024#Patch_Review_Sessions_2024



Re: Backdoor in upstream xz-utils

2024-03-31 Thread Rostislav Svoboda
> >> Is there a way we can blacklist known bad versions?
>
> I'm not sure what you mean, but I don't think so.

For beginning, what about adding a short comment:

diff --git a/gnu/packages/compression.scm b/gnu/packages/compression.scm
index 5de17b6b51..fd5ab7ba00 100644
--- a/gnu/packages/compression.scm
+++ b/gnu/packages/compression.scm
@@ -493,6 +493,8 @@ (define-public pbzip2
 (define-public xz
   (package
(name "xz")
+;;; Be reminded of the xz/liblzma backdoor in the versions 5.6.0 and 5.6.1!
+;;; See https://www.openwall.com/lists/oss-security/2024/03/29/4
(version "5.2.8")
(source (origin
 (method url-fetch)

as a single commit, with an appropriate commit message. That's a bang
for pretty much no money.

> The main danger is in guix time-machine to the past

Good point. So then a little note here, too:

diff --git a/doc/guix.texi b/doc/guix.texi
index 69a904473c..60909adf5f 100644
--- a/doc/guix.texi
+++ b/doc/guix.texi
@@ -5012,10 +5012,13 @@ Invoking guix time-machine
 @quotation Note
 The history of Guix is immutable and @command{guix time-machine}
 provides the exact same software as they are in a specific Guix
-revision.  Naturally, no security fixes are provided for old versions
-of Guix or its channels.  A careless use of @command{guix time-machine}
-opens the door to security vulnerabilities.  @xref{Invoking guix pull,
-@option{--allow-downgrades}}.
+revision.  Naturally, no security fixes are provided for old versions of
+Guix or its channels.  A careless use of @command{guix time-machine}
+opens the door to security vulnerabilities, or potentially even
+backdoors. (Do you remember the
+@uref{https://www.openwall.com/lists/oss-security/2024/03/29/4, backdoor
+in upstream xz/liblzma leading to ssh server compromise}?)
+@xref{Invoking guix pull, @option{--allow-downgrades}}.
 @end quotation

Cheers Bost



GNU Shepherd 0.10.4 released

2024-03-31 Thread Ludovic Courtès
We are pleased to announce the GNU Shepherd version 0.10.4, a bug-fix
release of the new 0.10.x series, representing 7 commits over 3 months.

The 0.10.x series is a major overhaul towards 1.0, addressing shortcomings
and providing new features that help comprehend system state.


• About

  The GNU Shepherd is a service manager written in Guile that looks
  after the herd of daemons running on the system.  It can be used as an
  “init” system (PID 1) and also by unprivileged users to manage
  per-user daemons—e.g., tor, privoxy, mcron.  It supports several
  daemon startup mechanisms, including inetd and systemd-style socket
  activation.  The GNU Shepherd is configured in Guile Scheme and can be
  extended in the same language.  It builds on a simple memory-safe and
  callback-free programming model.

  The GNU Shepherd is developed jointly with the GNU Guix project; it is
  used as the init system of Guix, GNU’s advanced GNU/Linux distribution.

  https://www.gnu.org/software/shepherd/


• Download

  For a summary of changes and contributors, see:
https://git.sv.gnu.org/gitweb/?p=shepherd.git;a=shortlog;h=v0.10.4
  or run this command from a git-cloned shepherd directory:
git shortlog v0.10.3..v0.10.4

  Here are the compressed sources and a GPG detached signature:
https://ftp.gnu.org/gnu/shepherd/shepherd-0.10.4.tar.gz
https://ftp.gnu.org/gnu/shepherd/shepherd-0.10.4.tar.gz.sig

  Use a mirror for higher download bandwidth:
https://ftpmirror.gnu.org/shepherd/shepherd-0.10.4.tar.gz
https://ftpmirror.gnu.org/shepherd/shepherd-0.10.4.tar.gz.sig

  Here are the SHA1 and SHA256 checksums:

1a547efd9416b492b89d010cb10cfd1b5cd35945  shepherd-0.10.4.tar.gz
fiLRTcdckD42Ng5I5VAPj+GQT6E04tA+VBKTkKjIBgg=  shepherd-0.10.4.tar.gz

  Verify the base64 SHA256 checksum with cksum -a sha256 --check
  from coreutils-9.2 or OpenBSD's cksum since 2007.

  Use a .sig file to verify that the corresponding file (without the
  .sig suffix) is intact.  First, be sure to download both the .sig file
  and the corresponding tarball.  Then, run a command like this:

gpg --verify shepherd-0.10.4.tar.gz.sig

  sh: line 1: gpg: command not found
  If that command fails because you don't have the required public key,
  or that public key has expired, try the following commands to retrieve
  or refresh it, and then rerun the 'gpg --verify' command.

gpg --recv-keys 3CE464558A84FDC69DB40CFB090B11993D9AEBB5

  As a last resort to find the key, you can try the official GNU
  keyring:

wget -q https://ftp.gnu.org/gnu/gnu-keyring.gpg
gpg --keyring gnu-keyring.gpg --verify shepherd-0.10.4.tar.gz.sig

  This release was bootstrapped with the following tools:
Autoconf 2.71
Automake 1.16.5
Gettext 0.21
Makeinfo 7.1


• Changes since version 0.10.3 (excerpt from the NEWS file)

  ** ‘herd unload root all’ stops services before unregistering them

  Previously, since version 0.10.0, ‘herd unload root all’ would unregister all
  services without first stopping them, leaving the system in a bogus state.

  ** ‘shepherd’ no longer bails out when reboot(2) returns ENOSYS

  In runc environments (among others), reboot(RB_DISABLE_CAD) returns ENOSYS,
  which would lead shepherd to fail to start.  This would prevent the use of
  shepherd in some containerized environments such as those of GitLab-CI.

  ** REPL service no longer attempts to enter debugger upon error

  The REPL service would spawn a regular REPL that enters a debugger (or
  “recursive prompt”) by default.  While this is a great feature, it could
  easily render the shepherd REPL unusable because the continuation of the
  debugger prompt could not always be suspended—see the thread at
  https://lists.gnu.org/archive/html/guix-devel/2024-01/msg00064.html.  To avoid
  that, the REPL now simply displays a backtrace upon error.


Please report bugs to bug-g...@gnu.org.
Join guix-devel@gnu.org for discussions.

Ludovic, on behalf of the Shepherd herd.


signature.asc
Description: PGP signature