Re: Concerns/questions around Software Heritage Archive

2024-05-01 Thread Ian Eure

Hello Guixers,

It’s been another week with no response or movement on this.  I’m 
disappointed that this situation seems to be getting treated so 
lightly.  Adhering to the terms of software licenses is 
fundamental to the operation of the free software ecosystem; there 
is no software freedom without it.  It’s surprising that a pretty 
clear-cut situation of creating derivative works of free software 
in violation of their licenses would be shrugged off so easily.


Whatever the Guix organization’s position is, I’m reaching my 
personal limit, and need to see some kind of positive movement on 
this[1].  If Guix is going to continue to facilitate license 
violations, I will have no choice but to remove my software from 
it to defend them.


 — Ian

[1]: Personally, I would be satisfied with a per-package setting 
which disables scheduling source for archiving by SWH.  Seeing 
this, or a committment to build this within a reasonable 
timeframe, would allay my concerns.


Ian Eure  writes:


Hello,

I’m following up on this since discussion since it’s been a 
month and

I haven’t heard any updates.

Summarizing the situation:

- SHF has an opaque, difficult, and undocumented process for
  handling name changes.  I’s like to stress again that this is
  *not* strictly a transgender issue (though it likely affects 
  them
  more, or in worse/different ways) -- it is a human respect 
  issue.

  Many, many more cisgender people change their name than
  transgender people.

- SHF gave their archive to HuggingFace, an "AI" company which 
is

  generating derived works with no attribution or provenance, in
  ways which violate the both licenses of the projects used to 
  train

 their model, and the SHF principles for LLMs.

- HuggingFace wasn’t respecting requests to opt-out of their 
model.



On the first point, it sounds like SHF has made concrete 
progress to
improve[1], which is very good to hear.  If SHF continues on 
this

course, I think the concern is resolved.

On the third point, HuggingFace has begun honoring opt-out 
requests,
but is still very far behind.  Also, they don’t remove code from 
the
older versions of their model -- it remains there forever.  This 
is

progress, but still, not great.

On the second point, I have not seen any public statements 
indicating
that either SHF or HuggingFace even acknowledges the problem. 
SHF’s

most recent newsletter[2], published in April 2024 (after these
concerns came to light), continues to tout that StarCoder2 is 
"the

first AI model aligned with our principles," which appears to be
false.  StarCoder2 includes both licensed and unlicensed code, 
and
HuggingFace’s own StarChat2 playground produces works derivative 
of
this code, with no attribution or licensing information.  There 
is

also no statement or position on the SHF news blog.  Nor hsa
HuggingFace either fixed their tools, or made a statement.  This 
is

still very much a live concern.

I have a few questions:

- Has Guix reached out to SHF to express these concerns / get a
  response?
- Whether a public or private response, what would Guix consider 
to

 be an acceptable response?  An unacceptable respoinse?
- How long is Guix willing to wait for a response?

Thanks,

 — Ian

[1]: 
https://cohost.org/arborelia/post/5273879-they-are-fixing-some

[2]:
https://www.softwareheritage.org/wp-content/uploads/2024/04/Software-Heritage-2024-Vision-Milestones-Newsletter.pdf

Ian Eure  writes:


Hi Guixy people,

I’d never heard of SWH before I started hacking on Guix last 
fall,

and
it struck me as rather a good idea.  However, I’ve seen some 
things

lately which have soured me on them.

They appear to be using the archive to build LLMs:
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/

I was also distressed to see how poorly they treated a 
developer who

wished to update their name:
https://cohost.org/arborelia/post/4968198-the-software-heritag
https://cohost.org/arborelia/post/5052044-the-software-heritag

GPL’d software I’ve created has been packaged for Guix, which I
assume
means it’s been included in SWH.  While I’m dealing with their 
(IMO:
unethical) opt-out process, I likely also need to stop new 
copies

from
being uploaded again in the future.

Is there a way to indicate, in a Guix package, that it should
*never*
be included in SWH?

Is there a way to tell Guix to never download source from SWH?

I want absolutely nothing to do with them.

Thanks,

 — Ian







Re: bug#67512: [PATCH v7 0/3] Add LibreWolf

2024-04-27 Thread Ian Eure

Ian Eure  writes:


Clément Lassieur  writes:


On Fri, Apr 12 2024, Andrew Tropin via Guix-patches via wrote:


On 2024-04-06 08:04, Ian Eure wrote:

Moves nss update to nss-3.98 / nss-certs-3.98 to avoid 
rebuilding

thousands of packages.

Rebases.

Ian Eure (3):
  gnu: Add nss-3.98.
  gnu: Add nss-certs-3.98.
  gnu: Add librewolf.

 gnu/packages/certs.scm |  16 +
 gnu/packages/librewolf.scm | 621
+
 gnu/packages/nss.scm   |  45 +++
 3 files changed, 682 insertions(+)
 create mode 100644 gnu/packages/librewolf.scm


base-commit: ade6845da6cec99f3bca46faac9b2bad6877817e


Hi Ian,

tested those patches, didn't notice any issues.

Added pipewire to LD_LIBRARY_PATH to make screensharing on 
wayland

to
work.

Added librewolf.scm to gnu/local.mk.

Pushed as
https://git.savannah.gnu.org/cgit/guix.git/commit/?id=3dc26b4eae

Thank you very much for you work!


Thank you Andrew for reviewing.

Now that this is pushed, is there anyone maintaining this
"librewolf"
package?  This is serious work, with security updates quite 
often.




Hi Clement,

I’m planning to continue sending patches for updates and the
like. Getting a working updater is close to the top of my list.



Right now the package is subject to

CVE-2024-3852 (high)
CVE-2024-3853 (high)
CVE-2024-3854 (high)
CVE-2024-3855 (high)
CVE-2024-3856 (high)
CVE-2024-3857 (high)
CVE-2024-3858 (high)
CVE-2024-3859 (moderate)
CVE-2024-3860 (moderate)
CVE-2024-3861 (moderate)
CVE-2024-3862 (moderate)
CVE-2024-3302 (low)
CVE-2024-3864 (high)
CVE-2024-3865 (high)



The version in Guix is the latest available.  I’ll send in a 
patch

when the next release happens; I’m waiting on upstream for that.



Okay, I see that I’m incorrect about this -- LibreWolf is moving 
onto Codeberg, but I was looking at their GitLab project, which 
doesn’t have the recent releases.  I’ll get this updated.


Thanks,

 — Ian



Re: bug#67512: [PATCH v7 0/3] Add LibreWolf

2024-04-27 Thread Ian Eure



Clément Lassieur  writes:


On Fri, Apr 12 2024, Andrew Tropin via Guix-patches via wrote:


On 2024-04-06 08:04, Ian Eure wrote:

Moves nss update to nss-3.98 / nss-certs-3.98 to avoid 
rebuilding thousands of packages.


Rebases.

Ian Eure (3):
  gnu: Add nss-3.98.
  gnu: Add nss-certs-3.98.
  gnu: Add librewolf.

 gnu/packages/certs.scm |  16 +
 gnu/packages/librewolf.scm | 621 
 +

 gnu/packages/nss.scm   |  45 +++
 3 files changed, 682 insertions(+)
 create mode 100644 gnu/packages/librewolf.scm


base-commit: ade6845da6cec99f3bca46faac9b2bad6877817e


Hi Ian,

tested those patches, didn't notice any issues.

Added pipewire to LD_LIBRARY_PATH to make screensharing on 
wayland to

work.

Added librewolf.scm to gnu/local.mk.

Pushed as
https://git.savannah.gnu.org/cgit/guix.git/commit/?id=3dc26b4eae

Thank you very much for you work!


Thank you Andrew for reviewing.

Now that this is pushed, is there anyone maintaining this 
"librewolf"
package?  This is serious work, with security updates quite 
often.




Hi Clement,

I’m planning to continue sending patches for updates and the like. 
Getting a working updater is close to the top of my list.




Right now the package is subject to

CVE-2024-3852 (high)
CVE-2024-3853 (high)
CVE-2024-3854 (high)
CVE-2024-3855 (high)
CVE-2024-3856 (high)
CVE-2024-3857 (high)
CVE-2024-3858 (high)
CVE-2024-3859 (moderate)
CVE-2024-3860 (moderate)
CVE-2024-3861 (moderate)
CVE-2024-3862 (moderate)
CVE-2024-3302 (low)
CVE-2024-3864 (high)
CVE-2024-3865 (high)



The version in Guix is the latest available.  I’ll send in a patch 
when the next release happens; I’m waiting on upstream for that.


Thanks,

 — Ian



Re: Fallout from recent nss-certs changes

2024-04-21 Thread Ian Eure
No, this is not a bug.  specification->package always returns the latest 
version of a package and has no way of knowing what variable(s) that package 
object is bound to.

On April 21, 2024 8:02:50 AM PDT, Felix Lechner  
wrote:
>Hi,
>
>On Sat, Apr 20 2024, Ian Eure wrote:
>
>> If an operating-system’s packages includes `(specification->package
>> "nss-certs")', this causes breakage, because that form selects version
>> 3.98, but %base-packages includes 3.88.1, which causes an error on the
>> next `guix system reconfigure' due to conflicting package versions in
>> the profile.
>
>Why does the unversioned stringy selector (specification->package
>"nss-certs") resolve to a version different from the unversioned
>variable nss-certs?  Is that a bug?
>
>Kind regards
>Felix
>
>P.S. I hoped to use the word "reified" but did not know how it fit in.

Thanks,

  — Ian

Re: Fallout from recent nss-certs changes

2024-04-21 Thread Ian Eure
The change is mentioned in the channel news, but it says nothing about needing 
to remove that part of the config.


On April 21, 2024 1:32:38 AM PDT, "pelzflorian (Florian Pelz)" 
 wrote:
>Hello Ian.  My understanding of the nss-certs etc/news.scm item had been
>that we should remove (specification->package "nss-certs"), which became
>unnecessary and clutters config.scm.  From what you write, this was
>actually not intended, but it is still not a bug IMHO.
>
>(I’m not involved with the change, though.)
>
>Regards,
>Florian

Thanks,

  — Ian

Re: Concerns/questions around Software Heritage Archive

2024-04-20 Thread Ian Eure

Hello,

I’m following up on this since discussion since it’s been a month 
and I haven’t heard any updates.


Summarizing the situation:

- SHF has an opaque, difficult, and undocumented process for 
 handling name changes.  I’s like to stress again that this is 
 *not* strictly a transgender issue (though it likely affects 
 them more, or in worse/different ways) -- it is a human respect 
 issue.  Many, many more cisgender people change their name than 
 transgender people.


- SHF gave their archive to HuggingFace, an "AI" company which is 
 generating derived works with no attribution or provenance, in 
 ways which violate the both licenses of the projects used to 
 train their model, and the SHF principles for LLMs.


- HuggingFace wasn’t respecting requests to opt-out of their 
 model.



On the first point, it sounds like SHF has made concrete progress 
to improve[1], which is very good to hear.  If SHF continues on 
this course, I think the concern is resolved.


On the third point, HuggingFace has begun honoring opt-out 
requests, but is still very far behind.  Also, they don’t remove 
code from the older versions of their model -- it remains there 
forever.  This is progress, but still, not great.


On the second point, I have not seen any public statements 
indicating that either SHF or HuggingFace even acknowledges the 
problem.  SHF’s most recent newsletter[2], published in April 2024 
(after these concerns came to light), continues to tout that 
StarCoder2 is "the first AI model aligned with our principles," 
which appears to be false.  StarCoder2 includes both licensed and 
unlicensed code, and HuggingFace’s own StarChat2 playground 
produces works derivative of this code, with no attribution or 
licensing information.  There is also no statement or position on 
the SHF news blog.  Nor hsa HuggingFace either fixed their tools, 
or made a statement.  This is still very much a live concern.


I have a few questions:

- Has Guix reached out to SHF to express these concerns / get a 
 response?
- Whether a public or private response, what would Guix consider 
 to be an acceptable response?  An unacceptable respoinse?

- How long is Guix willing to wait for a response?

Thanks,

 — Ian

[1]: 
https://cohost.org/arborelia/post/5273879-they-are-fixing-some
[2]: 
https://www.softwareheritage.org/wp-content/uploads/2024/04/Software-Heritage-2024-Vision-Milestones-Newsletter.pdf


Ian Eure  writes:


Hi Guixy people,

I’d never heard of SWH before I started hacking on Guix last 
fall, and
it struck me as rather a good idea.  However, I’ve seen some 
things

lately which have soured me on them.

They appear to be using the archive to build LLMs:
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/

I was also distressed to see how poorly they treated a developer 
who

wished to update their name:
https://cohost.org/arborelia/post/4968198-the-software-heritag
https://cohost.org/arborelia/post/5052044-the-software-heritag

GPL’d software I’ve created has been packaged for Guix, which I 
assume
means it’s been included in SWH.  While I’m dealing with their 
(IMO:
unethical) opt-out process, I likely also need to stop new 
copies from

being uploaded again in the future.

Is there a way to indicate, in a Guix package, that it should 
*never*

be included in SWH?

Is there a way to tell Guix to never download source from SWH?

I want absolutely nothing to do with them.

Thanks,

 — Ian





Fallout from recent nss-certs changes

2024-04-20 Thread Ian Eure
Some recent nss-certs changes have a negative side effects which 
needs to be fixed.


A patch of mine was pushed recently (commit 
0920693381d9f6b7923e69fe00be5de8621ddb6f), which adds nss-certs 
3.98 to (gnu packages certs), under the nss-certs-3.98 variable.


Then, commit fdfd7667c66cf9ce746330f39bcd366e124460e1 was pushed, 
which adds nss-certs to %base-packages-networking.  This 
references the nss-certs variable, which is version 3.88.1.


If an operating-system’s packages includes 
`(specification->package "nss-certs")', this causes breakage, 
because that form selects version 3.98, but %base-packages 
includes 3.88.1, which causes an error on the next `guix system 
reconfigure' due to conflicting package versions in the profile. 
Prior to commit 65e8472a4b6fc6f66871ba0dad518b7d4c63595e, the 
graphical installer would ask users if they wanted to install 
nss-certs, and put this form into the operating-system’s packages, 
so there are likely many users affected -- it bit me, and I’ve 
seen a couple in IRC as well.


I think the options to fix this are:

1. Removing (specification->package "nss-certs") from one’s 
operating-system.

2. Grafting nss-certs 3.98 onto nss-certs 3.88.1.
3. Replacing nss-certs 3.88.1 with 3.98.

The most expedient option is 1, as it can be applied by users -- 
but there’s probably not a good way to communicate that this needs 
to happen.


There was some talk in IRC about grafting nss/nss-certs, but it 
looks like this didn’t happen.  An upgrade is the best path, but 
would probably need to happen in core-updates, since this rebuilds 
a large number of packages.


Thoughts on this?

Thanks,

 — Ian



Re: Concerns/questions around Software Heritage Archive

2024-03-19 Thread Ian Eure



Simon Tournier  writes:


Hi,

On lun., 18 mars 2024 at 12:38, Ian Eure  
wrote:


They appear to be violating free software licenses on large 
scale. 
They are in violation of SWH’s own positions.


[...]


[1]: https://arxiv.org/html/2402.19173v1
[2]: 
https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground

[3]: https://huggingface.co/datasets/bigcode/the-stack-v2
[4]: https://github.com/bigcode-project/opt-out-v2/issues


Please note that Software Heritage folks are not co-author of 
all that;
or I misread.  Do not take me wrong, this is not an attempt to 
escape

but a query for waiting the feedback of SWH.



Shit rolls downhill.  It’s the least surprising thing in the world 
to find that an "AI" company is violating licenses, because the 
entire technology is based on infringement at a massive scale. 
SWH’s partnership with, and promotion of, both the company and its 
license-violating model, in violation of their *own stated 
principles*, raises very legitimate questions.


There are multpile overlapping concerns here; personal, 
organizational, legal, ethical, and technical.


From a personal, legal standpoint, HuggingFace is almost certainly 
in violation of my code’s licenses.  I will, therefore, work to 
remove my code from their models.  From a personal, ethical 
standpoint, I believe that SWH has proven themselves untrustworthy 
by enabling *and promoting* this infringement in violation of 
their own stated policies, and will work to remove my code from 
their archive.  Personally, I cannot extend them the benefit of 
the doubt on this.  They blew it.


From an organizational ethical standpoint, Guix is IMO on the 
right track by waiting on SWH (and perhaps pressuring them to fix 
things).  From an organizational, technical perspective, I would 
like to see concrete measures to support my (and hundreds of 
others’) personal, ethical desires to exclude software from SWH, 
and by extension, HuggingFace’s models.



As Ludo said, SWH folks are, by the way, also long time Free 
Software

activists.



In my view, this is not to their credit.  I’d expect people 
familiar with Free Software to be *more* sensitive to licensing 
concerns, thus less likely to partner with a company likely to 
violate them.



PS: Thanks for the detailed explanations.  I will provide my 
reading

later, after some concerns will be separated, eventually.


You’re very welcome.

Thanks,

 — Ian



Re: Concerns/questions around Software Heritage Archive

2024-03-18 Thread Ian Eure



Simon Tournier  writes:


Hi,

On sam., 16 mars 2024 at 08:52, Ian Eure  
wrote:


They appear to be using the archive to build LLMs: 
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/


About LLM, Software Heritage made a clear statement:

https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code

Quoting:

We feel that the question is no longer whether LLMs for 
code
should be built. They are already being built, 
independently of
what we do, and there is no turning back.  The real 
question is

how they should be built and whom they should benefit.

Principles:

1. Knowledge derived from the Software Heritage archive 
must be
given back to humanity, rather than monopolized for 
private
gain. The resulting machine learning models must be made 
available
under a suitable open license, together with the 
documentation and

toolings needed to use them.

2. The initial training data extracted from the Software 
Heritage
archive must be fully and precisely identified by, for 
example,
publishing the corresponding SWHID identifiers (note 
that, in the
context of Software Heritage, public availability of the 
initial

training data is a given: anyone can obtain it from the
archive). This will enable use cases such as: studying 
biases
(fairness), verifying if a code of interest was present 
in the
training data (transparency), and providing appropriate 
attribution
when generated code bears resemblance to training data 
(credit),

among others.

3. Mechanisms should be established, where possible, for 
authors to
exclude their archived code from the training inputs 
before model

training begins.

I hope it clarifies your concerns to some extent.



It doesn’t clarify them, but it does illustrate them.

HuggingFace and the StarCoder2 model is in violation of principle 
2.  By their own admission, they are including code without clear 
licensing[1]:


   The main difference between the Stack v2 and the Stack v1 is 
   that we

   include both permissively licensed and unlicensed files.

HuggingFace’s StarChat2 Playground[2] also violates this 
principle, as it outputs code without any license or provenance 
information; I know, because I tried it.  While their own terms of 
use for StarCoder2 state:


   Any use of all or part of the code gathered in The Stack v2 
   must abide by

   the terms of the original licenses...

...their own playground makes this impossible.

HuggingFace is also in violation of the third principle, because 
they haven’t established a functioning opt-out model[3].  Opting 
out requires using non-free software; requests have been sitting 
for nearly a year with no action or response; and out of every 
request submitted, only a single one has *ever* been honored.


They appear to be violating free software licenses on large scale. 
They are in violation of SWH’s own positions.



Moreover, you wrote: « I want absolutely nothing to do with 
them. »


Maybe there is a misunderstanding on your side about what “free
software” and GPL means because once “free software”, you cannot 
prevent

people to use “your” free software for any purposes you dislike.

If you want to bound the use cases of the software you create, 
you need
to explicitly specify that in the license.  And if you do, your 
software

will not be considered as “free software”.

That’s the double sword of “free software”. :-)



I am crystal clear on the meaning of free software.  I wish to 
remove it from these models *in order to* keep it free.


Thanks,

 — Ian

[1]: https://arxiv.org/html/2402.19173v1
[2]: 
https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground

[3]: https://huggingface.co/datasets/bigcode/the-stack-v2
[4]: https://github.com/bigcode-project/opt-out-v2/issues



Re: Concerns/questions around Software Heritage Archive

2024-03-17 Thread Ian Eure



MSavoritias  writes:


On 3/17/24 13:53, paul wrote:

Hi all ,

thank you MSavoritias for bringing up points that many of us
share. It's clearly a tradeoff what to do about the past. For 
the
future, as Christpher already stated, we need a serious 
solution
that we can uphold as a free software project that does not 
alienate

users or contributors.

My opinion is that names are just wrong to be included, not 
only
because of deadnames, but in general having a database with a 
column
first_name and a column second_name is something only a 35 yrs 
old

white cis boy could have thought was a good idea to model the
spectrum of names humans use all over the world:

https://web.archive.org/web/20240317114846/https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
If we'd really need to identify contributors, and obviously 
Guix
doesn't, we could use an UUID/machine readable identifier which 
can
then be mapped to a displayed name. I believe git can already 
be

configured to do so.


giacomo



The uuid sounds like a very interesting solution indeed.

I wonder how easy it could be to add it to git.



This also seems like interesting territory to explore.  The 
concerns raised around rewriting history have valid points; I 
think it’s impractical to rewrite history any time a change needs 
to happen, as that would be an ongoing source of disruption.  But 
rewriting history *once*, to switch to a more general mechanism, 
seems like a reasonable trade to me.  This also presents an 
opportunity: we could combine this with a default branch switch 
from master to main.  A news entry left as the final commit in 
master could inform people of whatever steps may be needed to 
update (if that can’t be automated), and the main branch would 
contain the rewritten history.


It’s certainly not a perfect solution, but it seems pragmatic.

 — Ian



Re: Concerns/questions around Software Heritage Archive

2024-03-17 Thread Ian Eure



MSavoritias  writes:


On 3/17/24 11:39, Lars-Dominik Braun wrote:

Hey,

I have heard folks in the Guix maintenance sphere claim that 
we
never rewrite git history in Guix, as a matter of policy. I 
believe
we should revisit that policy (is it actually written 
anywhere?)
with an eye towards possible exceptions, and develop a 
mechanism for
securely maintaining continuity of Guix installations after 
history

has been rewritten so that we maintain this as a technical
possibility in the future, even if we should choose to use it
sparingly.
the fallout of rewriting Guix’ git history would be 
devastating. It

would break every single Guix installation, because

a) `guix pull` authenticates commits and we might lose our 
trust anchor
if we rewrite history earlier than the introduction of this 
feature,
b) `guix pull` outright rejects changes to the commit history 
to prevent

downgrade attacks.

Additionally it would break every single existing usage of the
time machine and thereby completely defeat the goal of 
providing
reproducible software environments since the commit hash is 
used to

identify the point in time to jump to.

I doubt developing “mechanisms” – whatever they look like – 
would
be worth the effort. Our contributors matter, but so do our 
users. Never
ever rewriting our git history is a tradeoff we should make for 
our users.


Lars



Thats a good point. in the sense that its a tradeoff here and I
absolutely agree.


But let me add some food for thought here:

1. Were the social aspects considered when the system came into 
place?


2. Is it more important for the system to stay as is than to 
welcome

new contributors?

3. You mention "its a tradeoff we should make for our 
users". How many
trans people where involved in that decision and how much did 
their

opinion matter in this?


I am saying this because giving power to people(what is called 
users)

is not only handling them code or make sure everything is free
software.

Its also the hard part of making sure the voices of people that 
can

not code is heard and is participating and taking in mind.



Just want to say that I appreciate and agree with your thoughtful 
words.


I’d also note that name changes aren’t a concern limited to trans 
people, and framing this as "we have to upend everything Because 
Transgender" is both wrong and feels pretty bad to me.  Anyone can 
change their name at any time for any reason, or no reason at all, 
and may wish to update historical references to their previous 
names.  Having a mechanism to support this is, in my view, a 
matter of basic decency and respect for all humans.


Thanks,

 — Ian



Re: Concerns/questions around Software Heritage Archive

2024-03-16 Thread Ian Eure



Christopher Baines  writes:


[[PGP Signed Part:Undecided]]

Ian Eure  writes:


Hi Guixy people,

I’d never heard of SWH before I started hacking on Guix last 
fall, and
it struck me as rather a good idea.  However, I’ve seen some 
things

lately which have soured me on them.

They appear to be using the archive to build LLMs:
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/

I was also distressed to see how poorly they treated a 
developer who

wished to update their name:
https://cohost.org/arborelia/post/4968198-the-software-heritag
https://cohost.org/arborelia/post/5052044-the-software-heritag

GPL’d software I’ve created has been packaged for Guix, which I 
assume
means it’s been included in SWH.  While I’m dealing with their 
(IMO:
unethical) opt-out process, I likely also need to stop new 
copies from

being uploaded again in the future.

Is there a way to indicate, in a Guix package, that it should 
*never*

be included in SWH?


Not currently, and I don't really see the point in such a 
mechanism. If
you really never want them to store your code, then you need to 
license

it accordingly (and not make it free software).



I don’t want my code in SWH *because* it’s free.  A primary use of 
LLMs is laundering freely licensed software into proprietary, 
commercial projects through "AI" code completion and generation. 
Any Free software in an LLM training set can and will be used in 
violation of its license, without a clear path for the author to 
seek recourse.  I deleted my code off Github and abandoned it 
completely for this exact reason, and am deeply irked to be going 
through this nonsense again.


A more salient question may be: Is there a process within Guix 
(either the program or the organization) which uploads source to 
SWH?  Or does it rely on SWH indepently?


If the latter, my problem is likely solved by blocking SWH at my 
network edge and opting out of their archive (or trying to) and 
the downstream training models they’ve already put it in.  If the 
former, the only control I currently have to protect my license is 
removing packages from Guix which contain it.  I don’t want that 
outcome.


Noting also that the path here seems to be 
SWH->huggingface->bigcode training set, and the opt-out process 
for the training set appears to be a complete sham.  To opt-out, 
you must create a Github Issue; only one opt-out has *ever* been 
processed, and there are 200+ sitting there, many with no response 
for nearly a year[1].  I want no part of any of this.




Is there a way to tell Guix to never download source from SWH?


Also no, and it's probably best to do this at the network level 
on your

systems/network if you want this to be the case.



I’ll investigate this, though I’d prefer if there was a way to 
configure source mirrors in the Guix daemon.




Skipping back to this though:

I was also distressed to see how poorly they treated a 
developer who

wished to update their name:
https://cohost.org/arborelia/post/4968198-the-software-heritag
https://cohost.org/arborelia/post/5052044-the-software-heritag


This is probably worth thinking about as Guix is in a similar 
situation
regarding publishing source code, and people potentially wanting 
to
change historical source code both in things Guix packages and 
Guix

itself.

Like Software Heritage, there's cryptographical implications for
rewriting the Git history and modifying source tarballs or nars 
that

contain source code.

We have 17TiB of compressed source code and built software 
stored for
bordeaux.guix.gnu.org now and we should probably work out how to 
handle
people asking for things to be removed or changed (for any and 
all

reasons).

It's probably worth working out our position on this in advance 
of

someone asking.



Yes, I agree that Guix needs a better solution for this.

Thanks,

 — Ian

[1]: https://github.com/bigcode-project/opt-out-v2/issues



Concerns/questions around Software Heritage Archive

2024-03-16 Thread Ian Eure

Hi Guixy people,

I’d never heard of SWH before I started hacking on Guix last fall, 
and it struck me as rather a good idea.  However, I’ve seen some 
things lately which have soured me on them.


They appear to be using the archive to build LLMs: 
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/


I was also distressed to see how poorly they treated a developer 
who wished to update their name: 
https://cohost.org/arborelia/post/4968198-the-software-heritag 
https://cohost.org/arborelia/post/5052044-the-software-heritag


GPL’d software I’ve created has been packaged for Guix, which I 
assume means it’s been included in SWH.  While I’m dealing with 
their (IMO: unethical) opt-out process, I likely also need to stop 
new copies from being uploaded again in the future.


Is there a way to indicate, in a Guix package, that it should 
*never* be included in SWH?


Is there a way to tell Guix to never download source from SWH?

I want absolutely nothing to do with them.

Thanks,

 — Ian



Re: Proposal to turn off AOT in clojure-build-system

2024-03-09 Thread Ian Eure

Hello,

I’ve been following along with this discussion, as well as a 
discussion on Clojureverse, and thought it might be helpful to 
pull together some threads and design decisions around Clojure’s 
behavior.


Clojure is designed to ship libraries as source artifacts, not 
bytecode ("pretty much all other Clojure libraries ... are all 
source code by design[1]."; "Clojure is ... a source-first 
language[2]"), and the view of the community is that shipping AOT 
artifacts "is an anti-pattern[1]."   Clojure library JARs are more 
akin to source tarballs than binaries.  The original design and 
intent of Clojure’s AOT compiler is to compile "just a few 
things... for the interop case" or "Everything... For the 
'Application delivery', 'Syntax check', and 'reflection warnings' 
cases[3]."


Clojure’s compiler is transitive and "does not support separate 
compilation"[3], meaning when a namespace is compiled, anything it 
uses is compiled and emitted with it.  This is the crux of why 
mixing AOT and non-AOT code is troublesome: it causes dependency 
diamonds, where the AOT’d library contains a duplicate, older 
version of code used elsewhere in the project.


The Clojure reference on compiling[4] gives some reasons you might 
want to AOT: "To deliver your application without source," "To 
speed up application startup," "To generate named classes for use 
by Java," "To create an application that does not need runtime 
bytecode generation and custom classloaders."  Note that there’s 
no mention of compiling libraries for any reason; only 
applications.


When AOT is used "for the interop case," it’s typical to AOT only 
those namespaces[5], not the entire library.


Shipping AOT-compiled Clojure libraries has caused real and very 
weird and hard-to-debug problems in the past:


   https://clojure.atlassian.net/browse/CLJ-1886?focusedCommentId=15290
   https://github.com/clj-commons/byte-streams/issues/68 and 
   https://clojure.atlassian.net/browse/CLJ-1741


Clojure doesn’t have guarantees around ABI stability[6][7].  To 
date, most ABI changes have been additive, but there are no 
guarantees that the ABI will be compatible from any one version of 
Clojure to any other.  The understanding of the Clojure community 
is that the design of the current compiler can’t offer a stable 
ABI[8] at all.  Because nobody in the Clojure community AOTs 
intermediate (that is, library) code, this hasn’t been a problem 
and is unlikely to change.


"Clojure tries very hard to provide source compatibility but not 
bytecode compatibility across versions[9]."


Correctly handling the ABI concerns — which Guix currently does 
not do — would result in a combinatorial explosion of Clojure 
packages should multiple versions of Clojure ever be available in 
Guix at the same time.  For example, if someone wanted to package 
Clojure 1.12.0-alpha9, you’d need to duplicate every package 
taking Clojure as an input so they use the correct version.  While 
ABI breakage has been rare thus far, it seems likely that it’ll 
occur at some point; perhaps if Clojure reaches version 2.0.0.  If 
Guix disables AOT for Clojure libraries, we have source 
compatibility, and the AOT/ABI problems are moot.


Clojure’s compiler is non-deterministic[10]: the same compiler can 
will produce different bytecode for the same input across multiple 
runs.  I’m not sure if this is a problem for Guix at this point in 
time, but it seems out of line with Guix expectations for 
compilation generally.



Opinions follow:

If we’re taking votes, mine is to *not* AOT Clojure libraries, 
both for the technical reasons laid out in, and also for the 
social reason of not violating the principle of least surprise.  I 
understand that Guix and Clojure have very different approaches, 
and some balance must be struck.  However, the lack of ABI 
guarantees, the compiler’s behavior, the promise of source 
compatibility, and matching the expectation of the audience these 
tools are meant for all convince me that disabling AOT is the 
right course here.


AOT’ing Clojure applications (which means, more or less, "the 
Clojure tooling") is desirable, and should be maintained.


 — Ian

[1]: 
https://clojureverse.org/t/should-linux-distributions-ship-clojure-byte-compiled-aot-or-not/10595/8
[2]: 
https://clojureverse.org/t/should-linux-distributions-ship-clojure-byte-compiled-aot-or-not/10595/30

[3]: https://clojure.org/reference/compilation
[4]: 
https://archive.clojure.org/design-wiki/display/design/Transitive%2BAOT%2BCompilation.html

[5]: https://clojure.org/guides/deps_and_cli#aot_compilation
[6]: 
https://clojureverse.org/t/should-linux-distributions-ship-clojure-byte-compiled-aot-or-not/10595/30
[7]: 
https://gist.github.com/hiredman/c5710ad9247c6da12a99ff6c26dd442e
[8]: 
https://clojureverse.org/t/should-linux-distributions-ship-clojure-byte-compiled-aot-or-not/10595/4
[9]: 
https://clojureverse.org/t/should-linux-distributions-ship-clojure-byte-compiled-aot-or-not/10595/18

Re: Guix System automated installation

2024-02-27 Thread Ian Eure

Hi Giovanni,

Giovanni Biscuolo  writes:


[[PGP Signed Part:Undecided]]
Hello Ian,

I'm a little late to this discussion, sorry.

I'm adding guix-devel since it would be nice if some Guix 
developer have

something to add on this matter, for this reason I'm leaving all
previous messages intact

Csepp  writes:


Ian Eure  writes:


Hello,

On Debian, you can create a preseed file containing answers to 
all the questions
you’re prompted for during installation, and build a new 
install image which
includes it.  When booted, this installer skips any steps 
which have been
preconfigured, which allows for either fully automated 
installation, or partly
automated (prompt for hostname and root password, but 
otherwise automatic).


Does Guix have a way to do something like this?  The 
declarative config is more
or less the equivalent of the Debian preseed file, but I don’t 
see anything that

lets you build an image that’ll install a configuration.


When using the guided installation (info "(guix) Guided 
Graphical
Installation"), right before the actual installation on target 
(guix
system init...) you can edit the operating-system configuration 
file:

isn't it something similar to what you are looking for?

Please consider that a preseed file is very limited compared to 
a
full-fledged operating-system declaration since the latter 
contains the
declaration for *all* OS configuration, not just the installed 
packages.




I appreciate where you’re coming from, I also like the one-file 
system configuration, but this is inaccurate.  Guix’s 
operating-system doesn’t encompass the full scope of configuration 
necessary to install and run an OS; Debian’s preseed has 
significantly more functionality than just specifying the 
installed packages.  Right now, Debian’s system allows you to do 
things which Guix does not.


Preseed files contain values that get set in debconf, Debian’s 
system-wide configuration mechanism, so they can both configure 
the resulting system as well as the install process itself.  This 
means you can use a preseed file to tell the installer to 
partition disks, set up LUKS-encrypted volumes (and specify one or 
more passwords for them), format those with filesystems, install 
the set of packages you want, and configure them -- though 
debconf’s package configuration is more limited, generally, than 
Guix provides[1].  With Debian, I can create a custom installer 
image with a preseed file, boot it, and without touching a single 
other thing, it’ll install and configure the target machine, and 
reboot into it.  That boot-and-it-just-works experience is what I 
want from Guix.


For things that can’t be declared in operating-system, like disk 
partitioning and filesystem layout, the installer performs those 
tasks imperatively, then generates a system config with those 
device files and/or UUIDs populated, then initializes the system. 
There’s no facility for specifying disk partitioning or *creating* 
filesystems in the system config -- it can only be pointed at ones 
which have been created already.



guix system image is maybe closer, but it doesn’t automate 
everything that the

installer does.
But the installer can be used as a Scheme library, at least in 
theory.  The way
I would approach the problem is by creating a Shepherd service 
that runs at boot

from the live booted ISO.


I would really Love So Much™ to avoid writing imperative bash 
scripts
and just write Scheme code to be able to do a "full automatic" 
Guix

System install, using a workflow like this one:

1. guix system prepare --include preseed.scm disk-layout.scm 
/mnt


where disk-layout.scm is a declarative gexp used to partition, 
format

and mount all needed filesystems

the resulting config.scm would be an operating-system 
declaration with

included the contents of preseed.scm (packages and services
declarations)

2. guix system init config.scm /mnt (already working now)

...unfortunately I'm (still?!?) not able to contribute such code 
:-(




I don’t think there’s any need for a preseed.scm file, and I’m not 
sure what would be in that, but I think this is close to the right 
track.  Either operating-system should be extended to support 
things like disk partitioning, and effect those changes at 
reconfigure time (with suitable safeguards to avoid wrecking 
existing installs), or the operating-system config could get 
embedded in another struct which contains that, similar to the 
(image ...) config for `guix system image'.  I think there are 
some interesting possibilities here: you could change your 
partition layout and have Guix resize them / create new ones for 
you.


 — Ian

[1]: A workaround for this is to create packages which configure 
the system how you want, then include them on the installer image 
/ list them in the packages to be installed.  Not ideal, but you 
can.




Re: QA is back, who wants to review patches?

2024-02-11 Thread Ian Eure



Christopher Baines  writes:


[[PGP Signed Part:Undecided]]
Hey!

After substitute availability taking a bit of a dive recently, 
the
bordeaux build farm has finally caught back up and QA is back 
submitting

builds for packages changed by patches.

QA also has a feature to allow easily tagging patches (issues) 
as having
been reviewed and ready to merge (reviewed-looks-good). You can 
do this
via sending an email and QA has a form ("Mark patches as 
reviewed") on

the page for each issue to help you do this.

I'd encourage anyone and everyone to review patches, there's no 
burden

on you to spot every problem and you don't need any special
knowledge. You just need to not be involved (so you can't review 
your

own patches) and take a good look at the changes, mentioning any
questions that you have or problems that you spot. If you think 
the
changes look good to be merged, you can tag the issue 
accordingly.


When issues are tagged as reviewed-looks-good, QA will display 
them in
dark green at the top of the list of patches, so it's on those 
with
commit access to prioritise looking at these issues and merging 
the

patches if indeed they are ready.

Let me know if you have any comments or questions!



Wanted to check things out, but it’s giving the same error message 
on every page:


   An error occurred

   Sorry about that!
   misc-error

   #fvector->list: expected vector, got ~S#f#f

Also, the certificate for issues.guix.gnu.org expired today.

Is there a plan to improve the reliability Guix infrastructure? 
It seems like major things break with alarming regularity.


 — Ian



Re: Guix CLI, thoughts and suggestions

2024-01-20 Thread Ian Eure

Hi Carlo,

Thank you for the thoughtful reply.

Carlo Zancanaro  writes:


Hi Ian,

Much of what you've written is fair, and I'm sure that Guix's 
commands
could be better organised. I'm not really involved in Guix 
development,
but I think there are two "inconsistencies" that you've 
mentioned which

can be explained.

On Mon, Jan 15 2024, Ian Eure wrote:
Some examples of where I think Guix could do better.  This is 
an

illustrative list, not an exhaustive one.

Inconsistent organization
=

Most package-related commands are under `guix package', but 
many are
sibling commands.  Examples are `guix size', `guix lint', `guix 
hash',

etc.


I think the real inconsistency here is that `guix package' is 
poorly
named. This command really operates on profiles, and performs 
operations
(install, remove, list, etc.) on those profiles. Packages are 
given as

arguments to this command.

The other commands operate on, and show the properties of, 
packages.

Similarly with `guix build'.



Yes, I agree the behavior makes a bit more sense from that 
viewpoint.  However, it does have non-profile-related things in 
it, such as `--show' and `--search'.  This is getitng into another 
thing I’ve seen a bit of, which is overloaded commands -- ones 
that do multiple things that are unrelated or tangentally related. 
But, I didn’t have a good example, and my message was long enough 
already.





Inconsistency between verbs and options
===


... For example, installing a package is `guix package -i foo' 
rather
than `guix package install foo', removing is `guix package -r 
foo'
rather than `guix package remove foo', and listing installed 
packages

is `guix package -I' rather than `guix package installed' (or
similar).


The specific example of `guix package' might be explained by 
considering
it as a single transaction to update the profile. The command 
`guix
package' really says "perform a transaction on the profile", and 
the
options are the commands in the transaction. Since there can be 
multiple
commands, and the command names look like package names, they 
are

provided as options.

This doesn't fully explain the behaviour. In particular the 
example you

give:

This means that users can express commands which *seem* like 
they

should work, but do not.  For example `guix package -i emacs -r
emacs-pgtk -I' represents a command to 1) install emacs 2) 
remove
emacs-pgtk 3) list installed packages (which would verify the 
previous

two operations occurred). ...


seems reasonable to have working within the view of `guix 
package' as a

transactional operation.



I agree that this would make sense, but my understanding is that 
`guix package' doesn’t work like that -- it only performs the 
final operation in the list.  IMO, it should either do 
*everything* the commands specify, or print an error and take no 
action.



It's also worth noting that there are convenience shortcuts in 
`guix

install' and `guix remove'.

It seems like a lot of work to change, and backwards 
compatibility

also is an issue.


I see backwards compatibility as the main issue here. There was 
a lot of
discussion preceding the inclusion of `guix shell', because of 
the
prospect of breaking existing tutorials/documentation floating 
around on

the internet. This is an even bigger concern for a more drastic
reorganisation of the CLI.



I agree, I don’t think the situation can be improved without 
finding a solution to preserve BC.  But, I didn’t think it was 
worth making detailed plans for any of this before gauging whether 
the problem was one broadly considered to be worth solving.


 — Ian



Guix CLI, thoughts and suggestions

2024-01-15 Thread Ian Eure

Greetings,

As I’ve been learning Guix, one of the things I’ve found somewhat 
unpleasant is the lack of consistency within the guix CLI tool. 
It feels a bit Git-like, with not much consistency, commands that 
non-obvioulsy perform more than operation, related commands in 
different places in the tree, etc.


Just so you know where I’m coming from: I’ve found that compliex 
CLI tooling benefits from organization and consistency.  The Linux 
ip(8) command is a good example of this kind of organization: to 
add an IP address, you use `ip address add'.  To show address, `ip 
address show', and to remove one `ip address del'.  When options 
are needed, they get added after the verb or branch in the verb 
tree; the final verb may take positional arguments as well as 
--long or -s (short)-form options.


Some examples of where I think Guix could do better.  This is an 
illustrative list, not an exhaustive one.


Inconsistent organization
=

Most package-related commands are under `guix package', but many 
are sibling commands.  Examples are `guix size', `guix lint', 
`guix hash', etc.



Inconsistency between verbs and options
===

Some verbs are bare-word positional arguments, and others are 
flags to related verbs.  IMO, this is the biggest problem, and 
makes it very difficult to find all the things the CLI can do. 
`guix package' is a major offender in this area, as it mixes verbs 
and verb-specific options into the same level.  For example, 
installing a package is `guix package -i foo' rather than `guix 
package install foo', removing is `guix package -r foo' rather 
than `guix package remove foo', and listing installed packages is 
`guix package -I' rather than `guix package installed' (or 
similar).


This means that users can express commands which *seem* like they 
should work, but do not.  For example `guix package -i emacs -r 
emacs-pgtk -I' represents a command to 1) install emacs 2) remove 
emacs-pgtk 3) list installed packages (which would verify the 
previous two operations occurred).  This is a valid command within 
the accepted organization of `guix package', and doesn’t cause an 
error, but doesn’t work: the install and remove steps are ignored. 
A thing I’ve found throughout my career is that designing systems 
so it’s *impossible* to represent unsupported, nonsensical, or 
undefined things is an extremely valuable technique to avoid 
errors and pitfalls.  I think Guix could get a lot of mileage out 
of adopting something similar.


This causes a related problem of making it impossible to know what 
options are valid for what verbs.  Will `guix package --cores=8 -r 
emacs' remove the package while using eight cores of my system? 
Will `guix system -s i686 switch-generation 5' switch me to a 
32-bit version of generation 5?  If verbs are organized better, 
and have their own options, this ambiguity vanishes.



More inconsistency
==

Other parts of guix have the opposite problem: `guix system 
docker-image' probably ought to be an option to `guix system 
image' rather than a separate verb.



Inconsistency between similar commands
==

There are generations of both the system (for GuixSD) and the user 
profile, however, they work differently.  For the system, there’s 
`guix system list-generations' and `guix system 
switch-generation', but for the user profile, you need `guix 
package --list-generations' and `guix package 
--switch-generation=PATTERN'.  Additionally, no help is available 
for either of the system commands: `guix system switch-generations 
--help' gives the same output as `guix system --help' -- no 
description of the supported ways of expressing a generation are 
available.



Flattened verbs
===

Related, the generation-related commands under `guix system' ought 
to be one level deeper: `guix system generation list', `guix 
system generation switch' etc.



Repeated options


Many commands (`guix package', `guix system', `guix build', `guix 
shell') take -L options, to add Guile source to their load-path. 
This probably ought to be an option to guix itself, so you can do 
`guix -L~/src/my-channel build ...'.



Suggestions
===

All commands should be organized into a tree of verbs.

Verbs should have common aliases (`rm' for `remove', etc).

Verbs should be selected by specifying the minimum unambiguous 
substring.  For example `guix sys gen sw' could refer to `guix 
system generation switch'.


Options should be applicable to each level of the tree, ex `guix 
-L~/src/my-channel' would add that load-path, which would be 
visible to any command.  

Requesting help is a verb.  Appending "help" to any level of the 
verb tree should show both options applicable to that verb, and 
its child verbs.  `guix help' would show global options and all 
top-level verbs (package, system, generation, etc); `guix package 
help' would show 

Anyone working on more recent glib/gtk4 packages?

2023-12-17 Thread Ian Eure

Hello,

I wanted to package Fractal, which is a native GNOME client for 
Matrix chat.  It requires newer versions of glib and gtk than are 
currently in Guix.  I believe I’ve seen in IRC that some folks are 
working on getting GNOME 43/44 packages done, which probably needs 
the glib/gtk updates to happen.


If there’s work in this direction, could someone point me to it?

Thanks,

 — Ian