Re: Guidelines for pre-trained ML model weight binaries (Was re: Where should we put machine learning model parameters?)

2023-07-04 Thread Vagrant Cascadian
On 2023-07-04, zamfofex wrote:
>> On 07/03/2023 6:39 AM -03 Simon Tournier  wrote:
>> 
>> Well, I do not see any difference between pre-trained weights and icons
>> or sound or good fitted-parameters (e.g., the package
>> python-scikit-learn has a lot ;-)).  As I said elsewhere, I do not see
>> the difference between pre-trained neural network weights and genomic
>> references (e.g., the package r-bsgenome-hsapiens-1000genomes-hs37d5).
>
> I feel like, although this might (arguably) not be the case for
> leela-zero nor Lc0 specifically, for certain machine learning
> projects, a pretrained network can affect the program’s behavior so
> deeply that it might be considered a program itself! Such networks
> usually approximate an arbitrary function. The more complex the model
> is, the more complex the behavior of this function can be, and thus
> the closer to being an arbitrary program it is.
>
> But this “program” has no source code, it is effectively created in
> this binary form that is difficult to analyse.
>
> In any case, I feel like the issue Ludovic was talking about “user
> autonomy” is fairly relevant (as I understand it). For icons, images,
> and other similar kinds of assets, it is easy enough for the user to
> replace them, or create their own if they want. But for pretrained
> networks, even if they are under a free license, the user might not be
> able to easily create their own network that suits their purposes.
>
> For example, for an image recognition software, there might be data
> provided by the maintainers of the program that is able to recognise a
> specific set of objects in input images, but the user might want to
> use it to recognise a different kind of object. If it is too costly
> for the user to train a new network for their purposes (in terms of
> hardware and time required), the user is effectively entirely bound by
> the decisions of the maintainers of the software, and they can’t
> change it to suit their purposes.

For a more concrete example, with facial reconition in particular, many
models are quite good at recognition of faces of people of predominantly
white european descent, and not very good with people of other
backgrounds, in particular with darker skin. The models frequently
reflect the blatant and subtle biases of the society in which they are
created, and the creators who develop the models. This can have
disasterous consequences when using these models without that
understanding... (or even if you do understand the general biases!)

This seems like a significant issue for user freedom; with source code,
you can at least in theory examine the biases of the software you are
using.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Guix meetup at FOSSY?

2023-07-04 Thread Vagrant Cascadian
On 2023-07-04, Timothy Sample wrote:
> Vagrant Cascadian  writes:
>> On 2023-06-29, Timothy Sample wrote:
>>> The first FOSSY (Free and Open Source [Software] Yearly) conference
>>> is coming up in two weeks!  It’s being hosted in Portland, OR by the
>>> Software Freedom Conservancy.
>>>
>>> Why don’t we plan a little Guix meetup?
>>
>> Sounds great!
>
> Well, I was waiting to hear from more folks, but maybe not that many of
> us are going to FOSSY.  Either way, I still think we should plan
> something.
>
> What about having a Guix lunch on Friday?  I don’t really have strong
> feelings, but I thought I’d propose something concrete to get things
> going.  I’ve never been to Portland, so I don’t have thoughts about
> where to meet.  Do you know the city at all?

I have a couple decades of experience in Portland... :)

There are not a lot of things near the venue, but I will look for
options that are nearby and/or quick to get to by public transit ... and
ideally with outdoor seating or takeaway options.

live well,
  vagrant


signature.asc
Description: PGP signature


Re: Guix meetup at FOSSY?

2023-07-04 Thread Timothy Sample
Vagrant Cascadian  writes:

> On 2023-06-29, Timothy Sample wrote:
>> The first FOSSY (Free and Open Source [Software] Yearly) conference
>> is coming up in two weeks!  It’s being hosted in Portland, OR by the
>> Software Freedom Conservancy.
>>
>> Why don’t we plan a little Guix meetup?
>
> Sounds great!

Well, I was waiting to hear from more folks, but maybe not that many of
us are going to FOSSY.  Either way, I still think we should plan
something.

What about having a Guix lunch on Friday?  I don’t really have strong
feelings, but I thought I’d propose something concrete to get things
going.  I’ve never been to Portland, so I don’t have thoughts about
where to meet.  Do you know the city at all?


-- Tim



Re: Guidelines for pre-trained ML model weight binaries (Was re: Where should we put machine learning model parameters?)

2023-07-04 Thread zamfofex
> On 07/03/2023 6:39 AM -03 Simon Tournier  wrote:
> 
> Well, I do not see any difference between pre-trained weights and icons
> or sound or good fitted-parameters (e.g., the package
> python-scikit-learn has a lot ;-)).  As I said elsewhere, I do not see
> the difference between pre-trained neural network weights and genomic
> references (e.g., the package r-bsgenome-hsapiens-1000genomes-hs37d5).

I feel like, although this might (arguably) not be the case for leela-zero nor 
Lc0 specifically, for certain machine learning projects, a pretrained network 
can affect the program’s behavior so deeply that it might be considered a 
program itself! Such networks usually approximate an arbitrary function. The 
more complex the model is, the more complex the behavior of this function can 
be, and thus the closer to being an arbitrary program it is.

But this “program” has no source code, it is effectively created in this binary 
form that is difficult to analyse.

In any case, I feel like the issue Ludovic was talking about “user autonomy” is 
fairly relevant (as I understand it). For icons, images, and other similar 
kinds of assets, it is easy enough for the user to replace them, or create 
their own if they want. But for pretrained networks, even if they are under a 
free license, the user might not be able to easily create their own network 
that suits their purposes.

For example, for an image recognition software, there might be data provided by 
the maintainers of the program that is able to recognise a specific set of 
objects in input images, but the user might want to use it to recognise a 
different kind of object. If it is too costly for the user to train a new 
network for their purposes (in terms of hardware and time required), the user 
is effectively entirely bound by the decisions of the maintainers of the 
software, and they can’t change it to suit their purposes.

In that sense, there *might* be room for the maintainers to intentionally and 
maliciously bind the user to the kinds of data they want to provide. And 
perhaps even more likely (and even more dangerously), when the data is opaque 
enough, there is room for the maintainers to bias the networks in obscure ways 
without telling the user. You can imagine this being used in the context of, 
say, text generation or translation, for the developers to embed a certain 
opinion they have into the network in order to bias people towards it.

But even when not done maliciously, this can still be limiting to the user if 
they are unable to easily train their own networks as a replacement.



gnu: Add gmt.

2023-07-04 Thread Giovanni Biscuolo
Hello Ricardo,

checking commits done recently [1] I see you pushed ac86174e22 but I
cannot find the related patch sent to guix-patches: do I have missed the
relevant messages?

Thanks! Gio'

[1] I use git log to have a quick overview of latest changes and additions


P.S.. Generic Mapping Tools is a great package, kudos!

-- 
Giovanni Biscuolo

Xelera IT Infrastructures


signature.asc
Description: PGP signature