On Wed, 5 Jul 2023 at 17:00, Christopher Barker <python...@gmail.com> wrote:
> I'm noting this, because I think it's part of the problem to be solved, but 
> maybe not the mainone (to me anyway). I've been focused more on "these 
> packages are worthwhile, by some definition of worthwhile). While I think 
> Chris A is more focused on "which of these seemingly similar packages should 
> I use?" -- not unrelated, but not the same question either.
>

Indeed, not the same question; but "some definition of worthwhile" is
the crucial point here. If there is one single curated package index
of "worthwhile" packages, who decides what's on it and what's not? If
not everyone can agree, will there have to be multiple such listings?

> Technically, conda is similar to pip -- it has a default "channel" (a channel 
> is an indexed repository of packages) it points to, and you can point it to a 
> different one, or any number of others, or install a single package from a 
> particular channel.
>
> Socially, it's pretty different
> - There is no channel like PyPi that anyone can put anything on willy nilly.
> - The default channel is operated by Anaconda.com -- and no one else can put 
> any thing on there. (they take suggestions, but it's a pretty big lift to get 
> them to add a package)
> - The protocol for a channel is pretty simple -- all you really need is an 
> http server, but in practice, most folks host their channels on the 
> Anaconda.org server -- it's a free service that anyone can create a channel 
> on -- there are a LOT -- folks use them for their personal projects, etc.
>

So, high barrier to entry. Good to know. That's neither good nor bad
inherently, but it is a point of note.

> - Then there is conda-forge:
> It grew out of an effort to collaborate among a number of folks operating 
> channels focused on particular fields -- met/ocean science, astronomy, 
> computational biology, ... we all had different needs, but they overlapped -- 
> why not share resources? Thanks to the heroic efforts of a few folks, it grew 
> to what it is now: a gitHub and CI -based conda package build system that 
> published a conda channel on anaconda.org with over 22,000 (wow! I think I'm 
> reading that right) packages.
>
> (https://anaconda.org/conda-forge/repo)
>
> They are curated -- anyone can propose a new package (via PR) -- but it only 
> gets added once it's been reviewed and approved by the core team. Curation 
> wasn't the goal, but it's necessary in order to have any hope that they will 
> all work together. The review process is really of the package, not the code 
> in the package (is it built correctly? is it compatible with the rest of 
> conda-forge? Does it include the license file? Is there a maintainer? ...) 
> But the end result is a fair bit of curation -- users can be assured that:
> 1 - The package works
> 2 - The package is useful enough that someone took the time to get it up 
> there.
> 3 - It's very unlikely to be malware (I don't think the conda-forge policy 
> really looks hard for that, but typosquatting and that sort of thing are 
> pretty much impossible.
>

Cool. The trouble is, point 1 is nearly impossible to assure except in
the very narrowest of definitions, and point 2's value correlates with
the height of the barrier to entry, so it's a fairly strict tradeoff.
And unless that barrier is extremely high, there will always be the
possibility that someone puts in the effort to get malware pushed,
although it does become vanishingly improbable.

>> What about OS package managers like the Debian repositories?
>
> I have no idea, other than that the majors, at least, put a LOT of work into 
> having a pretty comprehensive base repository of "vetted" packages

Right; hence the question of how a "vetted Python package collection"
would compare. I can type "sudo apt install python-" and add the name
of a package, and I get some assurance that:

1) The package works
2) The package is useful enough
3) It's not malware
4) The specific *version* of the package works along with the versions
of everything else.

This is a very strong set of criteria, much stronger than we'd be
looking for here, as they come with correspondingly higher barriers to
entry (getting a package update into the Debian repositories becomes
harder and harder as the release date approaches).

> conda-forge has about 22,121 -- that's enough to be very useful, but a lot of 
> use-cases are not well covered, and I know I still have to contribute one 
> once in a while.
>
> Looking now -- PyPi has 465,295 projects more than 20 times as many -- I 
> wonder how many of those are "useful"?

Contrariwise, the Debian repository has under a thousand "python-*"
packages, but with a much stronger requirement that they be useful.

It's interesting that there are only twenty on PyPI for every one on
conda-forge. I would have expected a greater ratio. It seems that
conda-forge is able to be incomplete AND dauntingly large; how
successful would you be at guessing a package name based on a desired
goal?

ChrisA
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CVPM7MPQGXWDX5B4Z64L25ZDHQMC4LYJ/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to