Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-15 Thread Paul Moore
On 14 May 2015 at 19:01, Chris Barker chris.bar...@noaa.gov wrote:
 Ah -- here is the issue -- but I think we HAVE pretty much got what we need
 here -- at least for Windows and OS-X. It depends what you mean by
 curated, but it seems we have a (defacto?) policy for PyPi: binary wheels
 should be compatible with the python.org builds. So while each package wheel
 is supplied by the package maintainer one way or another, rather than by a
 central entity, it is more or less curated -- or at least standardized. And
 if you are going to put a binary wheel up, you need to make sure it matches
 -- and that is less than trivial for packages that require a third party
 dependency -- but building the lib statically and then linking it in is not
 inherently easier than doing a dynamic link.

I think the issue is that, if we have 5 different packages that depend
on (say) libpng, and we're using dynamic builds, then how do those
packages declare that they need access to libpng.dll? And on Windows,
where does the user put libpng.dll so that it gets picked up? And how
does a non-expert user do this (put it in $DIRECTORY, update your
PATH, blah blah blah doesn't work for the average user)?

In particular, on Windows, note that the shared DLL must either be in
the directory where the executable is located (which is fun when you
have virtualenvs, embedded interpreters, etc), or on PATH (which has
other implications - suppose I have an incompatible version of
libpng.dll, from mingw, say, somewhere earlier on PATH).

The problem isn't so much defining a standard ABI that shared DLLs
need - as you say, that's a more or less solved problem on Windows -
it's managing how those shared DLLs are made available to Python
extensions. And *that* is what Unix package managers do for you, and
Windows doesn't have a good solution for (other than bundle all the
dependent DLLs with the app, or suffer DLL hell).

Paul

PS For a fun exercise, it might be interesting to try breaking conda -
find a Python extension which uses a shared DLL, and check that it
works. Then grab an incompatible copy of that DLL (say a 32-bit
version on a 64-bit system) and try hacking around with PATH, putting
the incompatible DLL in a directory earlier on PATH than the correct
one, in the Windows directory, use an embedded interpreter like
mod_wsgi, tricks like that. If conda survives that, then the solution
that they use might be something worth documenting and might offer an
approach to solving the issue I described above. If it *doesn't*
survive, then that probably implies that the general environment pip
has to work in is less forgiving than the curated environment conda
manages (which is, of course, the whole point of using conda - to get
that curated environment :-))
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] PyPI and Uploading Documentation

2015-05-15 Thread Donald Stufft
Hey!

First, for anyone who isn't aware we recently migrated PyPI and TestPyPI so
that instead of storing files and documentation locally (really in a glusterfs
cluster) it will store them inside of S3. This will reduce maintenance overhead
of running PyPI by two servers since we'll no longer need to run our own
glusterfs cluster as well as improve the reliaiblity and scalability of the
PyPI service as a whole since we've had nothing but problems from glusterfs in
this regard.

One of the things that this brought to light was that the documentation
upload ability in PyPI is something that is not often used* however it
represents something which is one of our slowest routes. It's not a well
supported feature and I feel that it's going outside of the core competancy for
PyPI itself and instead PyPI should be focused on the files themselves. In
addition since the time this was added to PyPI a number of free services or
cheap services have came about that allow people to sanely upload raw document
without a reliance on any particular documentation system and we've also had
the rise of ReadTheDocs for when someone is using Sphinx as their documentation
system.

I think that it's time to retire this aspect of PyPI which has never been well
supported and instead focus on just the things that are core to PyPI. I don't
have a fully concrete proposal for doing this, but I wanted to reach out here
and figure out if anyone had any ideas. The rough idea I have currently is to
simply disable new documentation uploads and add two new small features. One
will allow users to delete their existing documentation from PyPI and the other
would allow them to register a redirect which would take them from the current
location to wherever they move their documentation too. In order to prevent
breaking documentation for projects which are defunct or not actively
maintained we would maintain the archived documentation (sans what anyone has
deleted) indefinetely.

Ideally I hope people start to use ReadTheDocs instead of PyPI itself. I think
that ReadTheDocs is a great service with heavy ties to the Python community.
They will do a better job at hosting documentation than PyPI ever could since
that is their core goal. In addition there is a dialog between ReadTheDocs and
PyPI where there is an opportunity to add integration between the two sites as
well as features to ReadTheDocs that it currently lacks that people feel are a
requirement before we move PyPI's documentation to read-only.

Thoughts?

* Out of ~60k projects only ~2.8k have ever uploaded documentation. It's not
  easy to tell if all of them are still using it as their primary source of
  documentation though or if it's old documentation that they just can't
  delete.


---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI and Uploading Documentation

2015-05-15 Thread Jim Fulton
On Fri, May 15, 2015 at 9:48 AM, Donald Stufft don...@stufft.io wrote:
 Hey!

 First, for anyone who isn't aware we recently migrated PyPI and TestPyPI so
 that instead of storing files and documentation locally (really in a glusterfs
 cluster) it will store them inside of S3. This will reduce maintenance 
 overhead
 of running PyPI by two servers since we'll no longer need to run our own
 glusterfs cluster as well as improve the reliaiblity and scalability of the
 PyPI service as a whole since we've had nothing but problems from glusterfs in
 this regard.

 One of the things that this brought to light was that the documentation
 upload ability in PyPI is something that is not often used* however it
 represents something which is one of our slowest routes. It's not a well
 supported feature and I feel that it's going outside of the core competancy 
 for
 PyPI itself and instead PyPI should be focused on the files themselves. In
 addition since the time this was added to PyPI a number of free services or
 cheap services have came about that allow people to sanely upload raw document
 without a reliance on any particular documentation system and we've also had
 the rise of ReadTheDocs for when someone is using Sphinx as their 
 documentation
 system.

 I think that it's time to retire this aspect of PyPI which has never been well
 supported and instead focus on just the things that are core to PyPI. I don't
 have a fully concrete proposal for doing this, but I wanted to reach out here
 and figure out if anyone had any ideas. The rough idea I have currently is to
 simply disable new documentation uploads and add two new small features. One
 will allow users to delete their existing documentation from PyPI and the 
 other
 would allow them to register a redirect which would take them from the current
 location to wherever they move their documentation too. In order to prevent
 breaking documentation for projects which are defunct or not actively
 maintained we would maintain the archived documentation (sans what anyone has
 deleted) indefinetely.

 Ideally I hope people start to use ReadTheDocs instead of PyPI itself. I think
 that ReadTheDocs is a great service with heavy ties to the Python community.
 They will do a better job at hosting documentation than PyPI ever could since
 that is their core goal. In addition there is a dialog between ReadTheDocs and
 PyPI where there is an opportunity to add integration between the two sites as
 well as features to ReadTheDocs that it currently lacks that people feel are a
 requirement before we move PyPI's documentation to read-only.

 Thoughts?

+1

 * Out of ~60k projects only ~2.8k have ever uploaded documentation. It's not
   easy to tell if all of them are still using it as their primary source of
   documentation though or if it's old documentation that they just can't
   delete.

I know I have documentation for at least one project hosted this way.
I don't remember how I set that up. :) I assume there will be some way
to notify owners of effected documentation.

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI and Uploading Documentation

2015-05-15 Thread Ian Cordasco
On Fri, May 15, 2015 at 8:48 AM, Donald Stufft don...@stufft.io wrote:
 Hey!

 First, for anyone who isn't aware we recently migrated PyPI and TestPyPI so
 that instead of storing files and documentation locally (really in a glusterfs
 cluster) it will store them inside of S3. This will reduce maintenance 
 overhead
 of running PyPI by two servers since we'll no longer need to run our own
 glusterfs cluster as well as improve the reliaiblity and scalability of the
 PyPI service as a whole since we've had nothing but problems from glusterfs in
 this regard.

 One of the things that this brought to light was that the documentation
 upload ability in PyPI is something that is not often used* however it
 represents something which is one of our slowest routes. It's not a well
 supported feature and I feel that it's going outside of the core competancy 
 for
 PyPI itself and instead PyPI should be focused on the files themselves. In
 addition since the time this was added to PyPI a number of free services or
 cheap services have came about that allow people to sanely upload raw document
 without a reliance on any particular documentation system and we've also had
 the rise of ReadTheDocs for when someone is using Sphinx as their 
 documentation
 system.

 I think that it's time to retire this aspect of PyPI which has never been well
 supported and instead focus on just the things that are core to PyPI. I don't
 have a fully concrete proposal for doing this, but I wanted to reach out here
 and figure out if anyone had any ideas. The rough idea I have currently is to
 simply disable new documentation uploads and add two new small features. One
 will allow users to delete their existing documentation from PyPI and the 
 other
 would allow them to register a redirect which would take them from the current
 location to wherever they move their documentation too. In order to prevent
 breaking documentation for projects which are defunct or not actively
 maintained we would maintain the archived documentation (sans what anyone has
 deleted) indefinetely.

 Ideally I hope people start to use ReadTheDocs instead of PyPI itself. I think
 that ReadTheDocs is a great service with heavy ties to the Python community.
 They will do a better job at hosting documentation than PyPI ever could since
 that is their core goal. In addition there is a dialog between ReadTheDocs and
 PyPI where there is an opportunity to add integration between the two sites as
 well as features to ReadTheDocs that it currently lacks that people feel are a
 requirement before we move PyPI's documentation to read-only.

 Thoughts?

 * Out of ~60k projects only ~2.8k have ever uploaded documentation. It's not
   easy to tell if all of them are still using it as their primary source of
   documentation though or if it's old documentation that they just can't
   delete.


 ---
 Donald Stufft
 PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA


 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 https://mail.python.org/mailman/listinfo/distutils-sig


I'm +1 on reducing the responsibilities of PyPI so it can act as an
index/repository in a much more efficient manner. I'm also +1 on
recommending people use ReadTheDocs. It supports more than just Sphinx
so it's a rather flexible option. It's also open source, which means
that anyone can contribute to it.

I'm curious to hear more about integrations between PyPI and
ReadTheDocs but I fully understand if they're not concrete enough to
be worthy of discussion.

--
Ian
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI and Uploading Documentation

2015-05-15 Thread Barry Warsaw
On May 15, 2015, at 09:48 AM, Donald Stufft wrote:

One of the things that this brought to light was that the documentation
upload ability in PyPI is something that is not often used* however it
represents something which is one of our slowest routes.

I use it for all my packages, mostly because it's easy for my upload
workflow: `python setup.py upload_docs`.

That said, with the rise of RTD, I have wondered about the usefulness of
pythonhosted documentation.   And because twine supports secure uploads of
code, but not documentation, that unease has grown.

So even while I use it, I agree it's time to consider retiring the service.

One thing I definitely want to retain though is the link to Package
Documentation from the project's PyPI page.  Please do give us a way to
specify that link.

The PSF is a supporter of RTD, but let's all make sure they stay in business!
https://readthedocs.org/sustainability/#about

Cheers,
-Barry


pgpE3C9e_rzyR.pgp
Description: OpenPGP digital signature
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Marcus Smith
Why not start with pip at least being a simple fail-on-conflict resolver
(vs the 1st found wins resolver it is now)...

You'd backtrack for the sake of re-walking when new constraints are
found, but not for the purpose of solving conflicts.

I know you're motivated to solve Openstack build issues, but many of the
issues I've seen in the pip tracker, I think would be solved without the
backtracking resolver you're trying to build.

On Fri, May 15, 2015 at 11:57 AM, Robert Collins robe...@robertcollins.net
wrote:

 So, I am working on pip issue 988: pip doesn't resolve packages at all.

 This is O(packages^alternatives_per_package): if you are resolving 10
 packages with 10 versions each, there are approximately 10^10 or 10G
 combinations. 10 packages with 100 versions each - 10^100.

 So - its going to depend pretty heavily on some good heuristics in
 whatever final algorithm makes its way in, but the problem is
 exacerbated by PyPI's nature.

 Most Linux (all that i'm aware of) distributions have at most 5
 versions of a package to consider at any time - installed(might be
 None), current release, current release security updates, new release
 being upgraded to, new release being upgraded to's security updates.
 And their common worst case is actually 2 versions: installed==current
 release and one new release present. They map alternatives out into
 separate packages (e.g. when an older soname is deliberately kept
 across an ABI incompatibility, you end up with 2 packages, not 2
 versions of one package). To when comparing pip's challenge to apt's:
 apt has ~20-30K packages, with altnernatives ~= 2, or
 pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)

 Scaling the number of packages is relatively easy; scaling the number
 of alternatives is harder. Even 300 packages (the dependency tree for
 openstack) is ~2.4T combinations to probe.

 I wonder if it makes sense to give some back-pressure to people, or at
 the very least encourage them to remove distributions that:
  - they don't support anymore
  - have security holes

 If folk consider PyPI a sort of historical archive then perhaps we
 could have a feature to select 'supported' versions by the author, and
 allow a query parameter to ask for all the versions.

 -Rob

 --
 Robert Collins rbtcoll...@hp.com
 Distinguished Technologist
 HP Converged Cloud
 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 https://mail.python.org/mailman/listinfo/distutils-sig

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI and Uploading Documentation

2015-05-15 Thread Daniel Holth
I'm using pypi's documentation hosting for pysdl2-cffi because I
thought it would be too difficult to run the documentation generator
(which parses documentation comments out of the wrapped C code) on the
readthedocs server. Perhaps there is a different way to do it that I'm
not familiar with.

On Fri, May 15, 2015 at 9:55 AM, Ian Cordasco
graffatcolmin...@gmail.com wrote:
 On Fri, May 15, 2015 at 8:48 AM, Donald Stufft don...@stufft.io wrote:
 Hey!

 First, for anyone who isn't aware we recently migrated PyPI and TestPyPI so
 that instead of storing files and documentation locally (really in a 
 glusterfs
 cluster) it will store them inside of S3. This will reduce maintenance 
 overhead
 of running PyPI by two servers since we'll no longer need to run our own
 glusterfs cluster as well as improve the reliaiblity and scalability of the
 PyPI service as a whole since we've had nothing but problems from glusterfs 
 in
 this regard.

 One of the things that this brought to light was that the documentation
 upload ability in PyPI is something that is not often used* however it
 represents something which is one of our slowest routes. It's not a well
 supported feature and I feel that it's going outside of the core competancy 
 for
 PyPI itself and instead PyPI should be focused on the files themselves. In
 addition since the time this was added to PyPI a number of free services or
 cheap services have came about that allow people to sanely upload raw 
 document
 without a reliance on any particular documentation system and we've also had
 the rise of ReadTheDocs for when someone is using Sphinx as their 
 documentation
 system.

 I think that it's time to retire this aspect of PyPI which has never been 
 well
 supported and instead focus on just the things that are core to PyPI. I don't
 have a fully concrete proposal for doing this, but I wanted to reach out here
 and figure out if anyone had any ideas. The rough idea I have currently is to
 simply disable new documentation uploads and add two new small features. One
 will allow users to delete their existing documentation from PyPI and the 
 other
 would allow them to register a redirect which would take them from the 
 current
 location to wherever they move their documentation too. In order to prevent
 breaking documentation for projects which are defunct or not actively
 maintained we would maintain the archived documentation (sans what anyone has
 deleted) indefinetely.

 Ideally I hope people start to use ReadTheDocs instead of PyPI itself. I 
 think
 that ReadTheDocs is a great service with heavy ties to the Python community.
 They will do a better job at hosting documentation than PyPI ever could since
 that is their core goal. In addition there is a dialog between ReadTheDocs 
 and
 PyPI where there is an opportunity to add integration between the two sites 
 as
 well as features to ReadTheDocs that it currently lacks that people feel are 
 a
 requirement before we move PyPI's documentation to read-only.

 Thoughts?

 * Out of ~60k projects only ~2.8k have ever uploaded documentation. It's not
   easy to tell if all of them are still using it as their primary source of
   documentation though or if it's old documentation that they just can't
   delete.


 ---
 Donald Stufft
 PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA


 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 https://mail.python.org/mailman/listinfo/distutils-sig


 I'm +1 on reducing the responsibilities of PyPI so it can act as an
 index/repository in a much more efficient manner. I'm also +1 on
 recommending people use ReadTheDocs. It supports more than just Sphinx
 so it's a rather flexible option. It's also open source, which means
 that anyone can contribute to it.

 I'm curious to hear more about integrations between PyPI and
 ReadTheDocs but I fully understand if they're not concrete enough to
 be worthy of discussion.

 --
 Ian
 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 https://mail.python.org/mailman/listinfo/distutils-sig
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 11:08, Marcus Smith qwc...@gmail.com wrote:
 Why not start with pip at least being a simple fail-on-conflict resolver
 (vs the 1st found wins resolver it is now)...

 You'd backtrack for the sake of re-walking when new constraints are found,
 but not for the purpose of solving conflicts.

 I know you're motivated to solve Openstack build issues, but many of the
 issues I've seen in the pip tracker, I think would be solved without the
 backtracking resolver you're trying to build.

Well, I'm scratching the itch I have. If its too hard to get something
decent, sure I might back off in my goals, but I see no point aiming
for something less than all the other language specific packaging
systems out there have.

-Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Donald Stufft

 On May 15, 2015, at 9:22 PM, Robert Collins robe...@robertcollins.net wrote:
 
 On 16 May 2015 at 11:08, Marcus Smith qwc...@gmail.com wrote:
 Why not start with pip at least being a simple fail-on-conflict resolver
 (vs the 1st found wins resolver it is now)...
 
 You'd backtrack for the sake of re-walking when new constraints are found,
 but not for the purpose of solving conflicts.
 
 I know you're motivated to solve Openstack build issues, but many of the
 issues I've seen in the pip tracker, I think would be solved without the
 backtracking resolver you're trying to build.
 
 Well, I'm scratching the itch I have. If its too hard to get something
 decent, sure I might back off in my goals, but I see no point aiming
 for something less than all the other language specific packaging
 systems out there have.


So what makes the other language specific packaging systems different? As far
as I know all of them have complete archives (e.g. they are like PyPI where they
have a lot of versions, not like Linux Distros). What can we learn from how they
solved this?

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 13:45, Donald Stufft don...@stufft.io wrote:

 On May 15, 2015, at 9:22 PM, Robert Collins robe...@robertcollins.net 
 wrote:

 On 16 May 2015 at 11:08, Marcus Smith qwc...@gmail.com wrote:
 Why not start with pip at least being a simple fail-on-conflict resolver
 (vs the 1st found wins resolver it is now)...

 You'd backtrack for the sake of re-walking when new constraints are found,
 but not for the purpose of solving conflicts.

 I know you're motivated to solve Openstack build issues, but many of the
 issues I've seen in the pip tracker, I think would be solved without the
 backtracking resolver you're trying to build.

 Well, I'm scratching the itch I have. If its too hard to get something
 decent, sure I might back off in my goals, but I see no point aiming
 for something less than all the other language specific packaging
 systems out there have.


 So what makes the other language specific packaging systems different? As far
 as I know all of them have complete archives (e.g. they are like PyPI where 
 they
 have a lot of versions, not like Linux Distros). What can we learn from how 
 they
 solved this?

NB; I have by no means finished low hanging heuristics and space
trimming stuff :). I have some simple things in mind and am sure I'll
end up with something 'good enough' for day to day use. The thing I'm
worried about is the long term health of the approach.

Good questions. Some of it is structural I suspect. A quick rundown.
cabal (haskell) has a backtracking solver that accepts various
parameters to tell it to try harder.
javascript effectively vendors every dep ever, so you end up with many
copies of the same library at different versions in the same process.
rust's cargo system currently solves everything in a single project
only - it has no binary packaging, only vendor-into-a-binary-build
packaging.
The gem behaviour I'm not yet familiar with.
perl I used to know but time has eroded it :/.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread David Cournapeau
On Sat, May 16, 2015 at 10:52 AM, Robert Collins robe...@robertcollins.net
wrote:

 On 16 May 2015 at 13:45, Donald Stufft don...@stufft.io wrote:
 
  On May 15, 2015, at 9:22 PM, Robert Collins robe...@robertcollins.net
 wrote:
 
  On 16 May 2015 at 11:08, Marcus Smith qwc...@gmail.com wrote:
  Why not start with pip at least being a simple fail-on-conflict
 resolver
  (vs the 1st found wins resolver it is now)...
 
  You'd backtrack for the sake of re-walking when new constraints are
 found,
  but not for the purpose of solving conflicts.
 
  I know you're motivated to solve Openstack build issues, but many of
 the
  issues I've seen in the pip tracker, I think would be solved without
 the
  backtracking resolver you're trying to build.
 
  Well, I'm scratching the itch I have. If its too hard to get something
  decent, sure I might back off in my goals, but I see no point aiming
  for something less than all the other language specific packaging
  systems out there have.
 
 
  So what makes the other language specific packaging systems different?
 As far
  as I know all of them have complete archives (e.g. they are like PyPI
 where they
  have a lot of versions, not like Linux Distros). What can we learn from
 how they
  solved this?

 NB; I have by no means finished low hanging heuristics and space
 trimming stuff :). I have some simple things in mind and am sure I'll
 end up with something 'good enough' for day to day use. The thing I'm
 worried about is the long term health of the approach.

 Good questions. Some of it is structural I suspect. A quick rundown.
 cabal (haskell) has a backtracking solver that accepts various
 parameters to tell it to try harder.
 javascript effectively vendors every dep ever, so you end up with many
 copies of the same library at different versions in the same process.
 rust's cargo system currently solves everything in a single project
 only - it has no binary packaging, only vendor-into-a-binary-build
 packaging.
 The gem behaviour I'm not yet familiar with.
 perl I used to know but time has eroded it :/.


FWIW, php uses a SAT-based solver in composer, which started as a port of
libsolv (the SAT solver used by openSUSE and soon Fedora).

I am no expert, but I don't understand why backtracking algorithms would to
be faster than SAT, since they both potentially need to walk over the full
set of possible solutions. It is hard to reason about the cost because the
worst case is in theory growing exponentially in both cases.

With a SAT-based algorithm for dependency resolution, it is relatively
simple to apply heuristics which massively prune the search space. For
example, when considering package A with say 10 potential versions A_1,
etc..., in theory, you need to generate the rules:

# - means not install, + means install
- A_1 | -  A_2
- A_1 | - A_3
...

and those constitute most of the rules in common cases. But it is possible
to tweak the SAT implementation to replace those rules by a single AtMost
one of rule per *package*, which means the #rules do not grow much by
versions.

The real difficulty of SAT-based solver is the optimization part: many
actually valid solutions are not acceptable, and that's where the
heuristics get more complicated.

David
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-15 Thread Chris Barker
On Fri, May 15, 2015 at 1:44 PM, Paul Moore p.f.mo...@gmail.com wrote:

  Is there any point? or is the current approach of statically linking all
  third party libs the way to go?

 If someone can make it work, that would be good. But (a) nobody is
 actually offering to develop and maintain such a solution,


well, it's on my list -- but it has been for a while, so I'm trying to
gauge whether it's worth putting at the top of my things to do for python
list. It's not at the top now ;-)


 and (b)
 it's not particularly clear how *much* of a benefit there would be
 (space savings aren't that important, ease of upgrade is fine as long
 as everything can be upgraded at once, etc...)


hmm -- that may be a trick, though not a uncommon one in python package
dependencies -- it maybe hard to have more than one version of a given lib
installed

 If so, then is there any chance of getting folks to conform to this
 standard
  for PyPi hosted binary packages anyway? i.e. the curation problem.

 If it exists, and if there's a benefit, people will use it.


OK -- that's encouraging...


  Personally, I'm on the fence here -- I really want newbies to be able to
  simply pip install as many packages as possible and get a good result
 when
  they do it.

 Static linking gives that on Windows FWIW. (And maybe also on OSX?)
 This is a key point, though - the goal shouldn't be use dynamic
 linking but rather make the user experience as easy as possible. It
 may even be that the best approach (dynamic or static) differs
 depending on platform.


true -- though we also have another problem -- that static linking solution
is actually a big pain for package maintainers -- building and linking the
dependencies the right way is a pain -- and now everyone that uses a given
lib has to figure out how to do it. Giving folks a dynamic lib they can use
would mie it easier for tehm to build their packages -- a nice benifit
there.

Though it's a lot harder to provide a build environment than just the lib
to link too .. Im going to have to think more about that...


  On the other hand, I've found that conda better supports this right now,
 so
  it's easier for me to simply use that for my tools.

 And that's an entirely reasonable position. The only problem (if
 indeed it is a problem) is that by having two different solutions
 (pip/wheel and conda) splits the developer resource, which means that
 neither approach moves forward as fast as a combined approach does.


That's not the only problem -- the current split between the (more than
one) scientifc python distributions, and the community of folks using
python.org and pypi creates a bit of a mess for newbies.

I'm reviving this conversation because i just spent a class lecture in a
python class on numpy/scipy -- these students have been using a python
install for months, using virtualenv, ip installing whatever they need, et.

and now, to use another lib, they have to go through machination, maybe
even installing a entire additional python. This is not good. And I've had
to help more than one student untangle a mess of Apple Python python.org
python, homebrew, and/or Anaconda -- for someone that doesn't really get
python pacakging, never mond PATHS, and .bashrc vs .bash_profile, etc, it's
an unholy mess.

There should be one-- and preferably only one --obvious way to do it. --
HA!


But that's OK if the two solutions are addressing different needs


The needs aren't really that different, however. Oh well.

Anyway, it seems like if I can find some time to prototype what I have in
mind, there may be some room to make it official if it works out. If anyone
else want to help -- let me know!

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Noah Kantrowitz

On May 15, 2015, at 9:19 PM, Donald Stufft don...@stufft.io wrote:

 
 On May 15, 2015, at 2:57 PM, Robert Collins robe...@robertcollins.net 
 wrote:
 
 So, I am working on pip issue 988: pip doesn't resolve packages at all.
 
 This is O(packages^alternatives_per_package): if you are resolving 10
 packages with 10 versions each, there are approximately 10^10 or 10G
 combinations. 10 packages with 100 versions each - 10^100.
 
 So - its going to depend pretty heavily on some good heuristics in
 whatever final algorithm makes its way in, but the problem is
 exacerbated by PyPI's nature.
 
 Most Linux (all that i'm aware of) distributions have at most 5
 versions of a package to consider at any time - installed(might be
 None), current release, current release security updates, new release
 being upgraded to, new release being upgraded to's security updates.
 And their common worst case is actually 2 versions: installed==current
 release and one new release present. They map alternatives out into
 separate packages (e.g. when an older soname is deliberately kept
 across an ABI incompatibility, you end up with 2 packages, not 2
 versions of one package). To when comparing pip's challenge to apt's:
 apt has ~20-30K packages, with altnernatives ~= 2, or
 pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)
 
 Scaling the number of packages is relatively easy; scaling the number
 of alternatives is harder. Even 300 packages (the dependency tree for
 openstack) is ~2.4T combinations to probe.
 
 I wonder if it makes sense to give some back-pressure to people, or at
 the very least encourage them to remove distributions that:
 - they don't support anymore
 - have security holes
 
 If folk consider PyPI a sort of historical archive then perhaps we
 could have a feature to select 'supported' versions by the author, and
 allow a query parameter to ask for all the versions.
 
 
 There have been a handful of projects which would only keep the latest N
 versions uploaded to PyPI. I know this primarily because it has caused
 people a decent amount of pain over time. It’s common for deployments people
 have to use a requirements.txt file like ``foo==1.0`` and to just continue
 to pull from PyPI. Deleting the old files breaks anyone doing that, so it 
 would
 require either having people bundle their deps in their repositories or
 some way to get at those old versions. Personally I think that we shouldn’t
 go deleting the old versions or encouraging people to do that.

+1 for this. While I appreciate why Linux distress purge old versions, it is 
absolutely hellish for reproducibility. If you are looking for prior art, check 
out the Molinillo project (https://github.com/CocoaPods/Molinillo) used by 
Bundler and CocoaPods. It is not as complex as the Solve gem used in Chef but 
offers a good balance of performance in satisfying constraints and false 
negatives on solution failures.

--Noah



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI and Uploading Documentation

2015-05-15 Thread Erik Bray
On Fri, May 15, 2015 at 9:48 AM, Donald Stufft don...@stufft.io wrote:
 Hey!

 First, for anyone who isn't aware we recently migrated PyPI and TestPyPI so
 that instead of storing files and documentation locally (really in a glusterfs
 cluster) it will store them inside of S3. This will reduce maintenance 
 overhead
 of running PyPI by two servers since we'll no longer need to run our own
 glusterfs cluster as well as improve the reliaiblity and scalability of the
 PyPI service as a whole since we've had nothing but problems from glusterfs in
 this regard.

 One of the things that this brought to light was that the documentation
 upload ability in PyPI is something that is not often used* however it
 represents something which is one of our slowest routes. It's not a well
 supported feature and I feel that it's going outside of the core competancy 
 for
 PyPI itself and instead PyPI should be focused on the files themselves. In
 addition since the time this was added to PyPI a number of free services or
 cheap services have came about that allow people to sanely upload raw document
 without a reliance on any particular documentation system and we've also had
 the rise of ReadTheDocs for when someone is using Sphinx as their 
 documentation
 system.

 I think that it's time to retire this aspect of PyPI which has never been well
 supported and instead focus on just the things that are core to PyPI. I don't
 have a fully concrete proposal for doing this, but I wanted to reach out here
 and figure out if anyone had any ideas. The rough idea I have currently is to
 simply disable new documentation uploads and add two new small features. One
 will allow users to delete their existing documentation from PyPI and the 
 other
 would allow them to register a redirect which would take them from the current
 location to wherever they move their documentation too. In order to prevent
 breaking documentation for projects which are defunct or not actively
 maintained we would maintain the archived documentation (sans what anyone has
 deleted) indefinetely.

 Ideally I hope people start to use ReadTheDocs instead of PyPI itself. I think
 that ReadTheDocs is a great service with heavy ties to the Python community.
 They will do a better job at hosting documentation than PyPI ever could since
 that is their core goal. In addition there is a dialog between ReadTheDocs and
 PyPI where there is an opportunity to add integration between the two sites as
 well as features to ReadTheDocs that it currently lacks that people feel are a
 requirement before we move PyPI's documentation to read-only.

 Thoughts?

 * Out of ~60k projects only ~2.8k have ever uploaded documentation. It's not
   easy to tell if all of them are still using it as their primary source of
   documentation though or if it's old documentation that they just can't
   delete.

+1 for all the stated reasons.

I have a few docs hosted on pythonhosted.org, but it's become a
nuisance to maintain since it does not support multiple doc versions
like ReadTheDocs, so now I've wound up with documentation for the same
projects on both sites.  The nuisance comes not so much in the process
(like Barry wrote, I've enjoyed the simplicity of `setup.py
upload_docs`), but because more often than not I've had to redirect
users to the Readthedocs docs to make sure they're using the correct
version of the docs.  So I wish I were not locked into updating the
pythonhosted.org docs and would be happy to retire them altogether
(much as I appreciated the service).

One question is how this would be handled at the tooling end.
setup.py upload_docs would have to be retired somehow.  Though it
might also be nice if some simple tools were added to make it just as
easy to add docs to ReadTheDocs.  I know something like upload_docs
doesn't really make sense, since RTD handles the checkout and build of
the docs.  But there's still a manual step of enabling new versions of
the docs that it would be nice to make as effortless as `setup.py
upload_docs`.  I gues that's off-topic for the PyPI end of things
though.

Erik
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 07:18, Justin Cappos jcap...@nyu.edu wrote:
 One thing to consider is that if conflicts do not exist (or are very rare),
 the number of possible combinations is a moot point.  A greedy algorithm for
 installation (which just chooses the most favored package to resolve each
 dependency) will run in linear time with the number of packages it would
 install, if no conflicts exist.

 So, what you are saying about state exploration may be true for a resolver
 that uses something like a SAT solver, but doesn't apply to backtracking
 dependency resolution (unless a huge number of conflicts occur) or simple
 dependency resolution (at all).  SAT solvers do have heuristics to avoid
 this blow up, except in pathological cases.  However, simple / backtracking
 dependency resolution systems have the further advantage of not needing to
 request unneeded metadata in the first place...

Your intuition here is misleading, sorry :(.

You're right about 'hey if everything fits its linear', but the reason
we have this bug open, with people adding a new example of it every
week or so (as someone who didn't realise that pip doesn't resolve
finds out and adds it to pip's issue tracker, only to have it made
into a dupe).

Backtracking recursive resolvers have exactly the same O as SAT.

Example: say I have an ecosystem of 10 packages. A-J. And they do a
release every 6 months that is guaranteed to work together, but every
time some issue occurs which ends up clamping the group together- e.g.
an external release breaks API and so A1s deps are disjoint with A2s,
and then the same between A2 and A3. Even though A1's API is
compatible with B2's: its not internal bad code, its just taking *one*
external dep breaking its API.

After 2 releases you have 10^2 combinations, but only 4 are valid at
all. Thats 4%. 8 releases gets you 10^8, 8 valid combinations, or
0.008%.

Now there are two things to examine here. How likely is this to happen
to PyPI users, and can a backtracker (which btw is what my code is)
handle this better than a SAT solver.

In terms of likelyhood - OpenStack hits this every release. Its not
that our libraries are incompatible with each other, its that given
250 packages (the 200 in error I quoted just shows that the resolver
hadn't obtained version data for everything), *something* breaks API
compat in each 6 month release cycle, and so you end up with the whole
set effectively locking together. In fact, it has happened so
consistently to OpenStack that we now release our libraries with
closed specifiers : =min_version, next_version.

Secondly, backtrackers. Assume nothing is installed, and you want the
latest release. Then sure,
for a_version in A:
   for b_version in B:
  for c_version in C:

etc will hit the most recent release first time, and you're golden.

Assume you have the prior release installed, and you ran pip without
-U, but something that pulls in the latest release of one lib (which
then nabs everything). e.g. pip install J3.0 (and we have releases
1,2,3,4 of A through J).

Now, for a_version in A: is going to have the installed version of A
in its first step. B likewise. So we'll end up with a trace like:
A==3
B==3
C==3
D==3
E==3
F==3
G==3
H==3
I==3
J==3 error, backtrack (user specifier)
J==4 error, backtracks once the external deps are considered and the
conflict is found
J==2 error, backtrack (user specifier)
J==1 error, backtrack (user specifier)
I==4
J==3 error, backtrack (user specifier)
J==4 error, backtracks somewere in the external deps
J==2, error, backtrack (user specifier)

and so on, until we finally tick over to A==4.

More generally, any already installed version (without -U) can cause a
backtracking resolver to try *all* possible other combinations before
realising that that installed version is the problem and bumping it. A
heuristic to look for those and bump them first then hits molasses as
soon as one of the installed versions needs to be kept as-is.

Anyhow, my goal here is to start the conversation; pip will need some
knobs because no matter how good the heuristics users will need escape
hatches. (One of which is to fully specify their needs).

-Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-15 Thread Chris Barker
On Fri, May 15, 2015 at 1:49 AM, Paul Moore p.f.mo...@gmail.com wrote:

 On 14 May 2015 at 19:01, Chris Barker chris.bar...@noaa.gov wrote:
  Ah -- here is the issue -- but I think we HAVE pretty much got what we
 need
  here -- at least for Windows and OS-X. It depends what you mean by
  curated, but it seems we have a (defacto?) policy for PyPi: binary
 wheels
  should be compatible with the python.org builds. So while each package
 wheel
  is supplied by the package maintainer one way or another, rather than by
 a
  central entity, it is more or less curated -- or at least standardized.
 And
  if you are going to put a binary wheel up, you need to make sure it
 matches
  -- and that is less than trivial for packages that require a third party
  dependency -- but building the lib statically and then linking it in is
 not
  inherently easier than doing a dynamic link.

 I think the issue is that, if we have 5 different packages that depend
 on (say) libpng, and we're using dynamic builds, then how do those
 packages declare that they need access to libpng.dll?


this is the missing link -- it is a binary build dependency, not a package
dependency -- so not such much that matplotlib-1.4.3 depends on libpng.x.y,
but that:


matplotlib-1.4.3-cp27-none-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl

depends on:

libpng-x.y

(all those binary parts will come from the platform)

That's what's missing now.

And on Windows,
 where does the user put libpng.dll so that it gets picked up?


Well, here is the rub -- Windows dll hell really  is hell -- but I think it
goes into the python dll searchpath (sorry, not on a
Windows box where I can really check this out right now), it can work -- I
know have an in-house product that has multiple python modules sharing a
single dll somehow



 And how
 does a non-expert user do this (put it in $DIRECTORY, update your
 PATH, blah blah blah doesn't work for the average user)?


That's why we may need to update the tooling to handle this -- Im not
totally sure if the current wheel format can support this on Windows --
though it can on OS-X.

In particular, on Windows, note that the shared DLL must either be in
 the directory where the executable is located (which is fun when you
 have virtualenvs, embedded interpreters, etc), or on PATH (which has
 other implications - suppose I have an incompatible version of
 libpng.dll, from mingw, say, somewhere earlier on PATH).


that would be dll hell, yes.


 The problem isn't so much defining a standard ABI that shared DLLs
 need - as you say, that's a more or less solved problem on Windows -
 it's managing how those shared DLLs are made available to Python
 extensions. And *that* is what Unix package managers do for you, and
 Windows doesn't have a good solution for (other than bundle all the
 dependent DLLs with the app, or suffer DLL hell).


exactly -- but if we consider the python install to be the app, rather
than an individual python bundle, then we _may_ be OK.

PS For a fun exercise, it might be interesting to try breaking conda -


Windows really is simply broken [1] in this regard -- so I'm quite sure you
could break conda -- but it does seem to do a pretty good job of not being
broken easily by common uses -- I can't say I know enough about Windows dll
finding or conda to know how...

Oh, and conda is actually broken in this regard on OS-X at this point -- if
you compile your own extension in an anaconda environment, it will find a
shared lib at compile time that it won't find at run time. -- the conda
install process fixes these, but that's a pain when under development --
i.e. you don't want to have to actually install the package with conda to
run a test each time you re-build the dll.. (or even change a bit of python
code...)

But in short -- I'm pretty sure there is a way, on all systems, to have a
standard way to build extension modules, combined with a standard way to
install shared libs, so that a lib can be shared among multiple packages.
So the question remains:

Is there any point? or is the current approach of statically linking all
third party libs the way to go?

If so, then is there any chance of getting folks to conform to this
standard for PyPi hosted binary packages anyway? i.e. the curation problem.

Personally, I'm on the fence here -- I really want newbies to be able to
simply pip install as many packages as possible and get a good result
when they do it.

On the other hand, I've found that conda better supports this right now, so
it's easier for me to simply use that for my tools.


-Chris


[1] My take on dll hell:

a) it's inherently difficult -- which is why Linux provides a system
package manager.

b) however, Windows really does make it MORE difficult than it has to be:
  i) it looks first next the executable
  ii) it also looks on the PATH (rather than a separate DLL_PATH)
  Combine these two, and you have some folks dropping dlls next 

Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Jim Fulton
On Fri, May 15, 2015 at 2:57 PM, Robert Collins
robe...@robertcollins.net wrote:
 So, I am working on pip issue 988: pip doesn't resolve packages at all.

 This is O(packages^alternatives_per_package): if you are resolving 10
 packages with 10 versions each, there are approximately 10^10 or 10G
 combinations. 10 packages with 100 versions each - 10^100.

 So - its going to depend pretty heavily on some good heuristics in
 whatever final algorithm makes its way in, but the problem is
 exacerbated by PyPI's nature.

 Most Linux (all that i'm aware of) distributions have at most 5
 versions of a package to consider at any time - installed(might be
 None), current release, current release security updates, new release
 being upgraded to, new release being upgraded to's security updates.
 And their common worst case is actually 2 versions: installed==current
 release and one new release present. They map alternatives out into
 separate packages (e.g. when an older soname is deliberately kept
 across an ABI incompatibility, you end up with 2 packages, not 2
 versions of one package). To when comparing pip's challenge to apt's:
 apt has ~20-30K packages, with altnernatives ~= 2, or
 pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)

 Scaling the number of packages is relatively easy; scaling the number
 of alternatives is harder. Even 300 packages (the dependency tree for
 openstack) is ~2.4T combinations to probe.

 I wonder if it makes sense to give some back-pressure to people, or at
 the very least encourage them to remove distributions that:
  - they don't support anymore
  - have security holes

 If folk consider PyPI a sort of historical archive then perhaps we
 could have a feature to select 'supported' versions by the author, and
 allow a query parameter to ask for all the versions.

You could simply limit the number of versions from PyPI
you consider.

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 08:27, Jim Fulton j...@zope.com wrote:

 If folk consider PyPI a sort of historical archive then perhaps we
 could have a feature to select 'supported' versions by the author, and
 allow a query parameter to ask for all the versions.

 You could simply limit the number of versions from PyPI
 you consider.

Yes - it would be nice IMO to give package authors some influence over
that though.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-15 Thread Paul Moore
On 15 May 2015 at 20:56, Chris Barker chris.bar...@noaa.gov wrote:
 But in short -- I'm pretty sure there is a way, on all systems, to have a
 standard way to build extension modules, combined with a standard way to
 install shared libs, so that a lib can be shared among multiple packages. So
 the question remains:

 Is there any point? or is the current approach of statically linking all
 third party libs the way to go?

If someone can make it work, that would be good. But (a) nobody is
actually offering to develop and maintain such a solution, and (b)
it's not particularly clear how *much* of a benefit there would be
(space savings aren't that important, ease of upgrade is fine as long
as everything can be upgraded at once, etc...)

 If so, then is there any chance of getting folks to conform to this standard
 for PyPi hosted binary packages anyway? i.e. the curation problem.

If it exists, and if there's a benefit, people will use it.

 Personally, I'm on the fence here -- I really want newbies to be able to
 simply pip install as many packages as possible and get a good result when
 they do it.

Static linking gives that on Windows FWIW. (And maybe also on OSX?)
This is a key point, though - the goal shouldn't be use dynamic
linking but rather make the user experience as easy as possible. It
may even be that the best approach (dynamic or static) differs
depending on platform.

 On the other hand, I've found that conda better supports this right now, so
 it's easier for me to simply use that for my tools.

And that's an entirely reasonable position. The only problem (if
indeed it is a problem) is that by having two different solutions
(pip/wheel and conda) splits the developer resource, which means that
neither approach moves forward as fast as a combined approach does.
But that's OK if the two solutions are addressing different needs
(which seems to be the case for the moment).

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Justin Cappos

 Example: say I have an ecosystem of 10 packages. A-J. And they do a
 release every 6 months that is guaranteed to work together, but every
 time some issue occurs which ends up clamping the group together- e.g.
 an external release breaks API and so A1s deps are disjoint with A2s,
 and then the same between A2 and A3. Even though A1's API is
 compatible with B2's: its not internal bad code, its just taking *one*
 external dep breaking its API.

 After 2 releases you have 10^2 combinations, but only 4 are valid at
 all. Thats 4%. 8 releases gets you 10^8, 8 valid combinations, or
 0.008%.


Yes, so this would not be a situation where conflicts do not exist (or are
very rare) as my post mentioned.  Is this rate of conflicts something you
measured or is it a value you made up?


I don't hear anyone arguing that the status quo makes sense.  I think we're
mostly just chatting about the right thing to optimize the solution for and
what sorts of short cuts may be useful (or even necessary).  Since we can
measure the actual conflict and other values in practice, data seems like
it may be a good path toward grounding the discussion...

Thanks,
Justin
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
So, I am working on pip issue 988: pip doesn't resolve packages at all.

This is O(packages^alternatives_per_package): if you are resolving 10
packages with 10 versions each, there are approximately 10^10 or 10G
combinations. 10 packages with 100 versions each - 10^100.

So - its going to depend pretty heavily on some good heuristics in
whatever final algorithm makes its way in, but the problem is
exacerbated by PyPI's nature.

Most Linux (all that i'm aware of) distributions have at most 5
versions of a package to consider at any time - installed(might be
None), current release, current release security updates, new release
being upgraded to, new release being upgraded to's security updates.
And their common worst case is actually 2 versions: installed==current
release and one new release present. They map alternatives out into
separate packages (e.g. when an older soname is deliberately kept
across an ABI incompatibility, you end up with 2 packages, not 2
versions of one package). To when comparing pip's challenge to apt's:
apt has ~20-30K packages, with altnernatives ~= 2, or
pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)

Scaling the number of packages is relatively easy; scaling the number
of alternatives is harder. Even 300 packages (the dependency tree for
openstack) is ~2.4T combinations to probe.

I wonder if it makes sense to give some back-pressure to people, or at
the very least encourage them to remove distributions that:
 - they don't support anymore
 - have security holes

If folk consider PyPI a sort of historical archive then perhaps we
could have a feature to select 'supported' versions by the author, and
allow a query parameter to ask for all the versions.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI and Uploading Documentation

2015-05-15 Thread Éric Araujo
Le 2015-05-15 14:34, Donald Stufft a écrit :
 As far as retiring upload_docs, the sanest thing to do I think would be
 to just have PyPI return an error code that upload_docs would display
 to the end user. The command itself is in use by a few other systems I think
 so we might not want to remove it wholesale from Python itself (or maybe
 we do? It’s a hard question since it’s tied to an external service unlike
 most of the stdlib).

upload_docs is implemented by setuptools, not distutils.

Cheers
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Donald Stufft

 On May 15, 2015, at 2:57 PM, Robert Collins robe...@robertcollins.net wrote:
 
 So, I am working on pip issue 988: pip doesn't resolve packages at all.
 
 This is O(packages^alternatives_per_package): if you are resolving 10
 packages with 10 versions each, there are approximately 10^10 or 10G
 combinations. 10 packages with 100 versions each - 10^100.
 
 So - its going to depend pretty heavily on some good heuristics in
 whatever final algorithm makes its way in, but the problem is
 exacerbated by PyPI's nature.
 
 Most Linux (all that i'm aware of) distributions have at most 5
 versions of a package to consider at any time - installed(might be
 None), current release, current release security updates, new release
 being upgraded to, new release being upgraded to's security updates.
 And their common worst case is actually 2 versions: installed==current
 release and one new release present. They map alternatives out into
 separate packages (e.g. when an older soname is deliberately kept
 across an ABI incompatibility, you end up with 2 packages, not 2
 versions of one package). To when comparing pip's challenge to apt's:
 apt has ~20-30K packages, with altnernatives ~= 2, or
 pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)
 
 Scaling the number of packages is relatively easy; scaling the number
 of alternatives is harder. Even 300 packages (the dependency tree for
 openstack) is ~2.4T combinations to probe.
 
 I wonder if it makes sense to give some back-pressure to people, or at
 the very least encourage them to remove distributions that:
 - they don't support anymore
 - have security holes
 
 If folk consider PyPI a sort of historical archive then perhaps we
 could have a feature to select 'supported' versions by the author, and
 allow a query parameter to ask for all the versions.
 

There have been a handful of projects which would only keep the latest N
versions uploaded to PyPI. I know this primarily because it has caused
people a decent amount of pain over time. It’s common for deployments people
have to use a requirements.txt file like ``foo==1.0`` and to just continue
to pull from PyPI. Deleting the old files breaks anyone doing that, so it would
require either having people bundle their deps in their repositories or
some way to get at those old versions. Personally I think that we shouldn’t
go deleting the old versions or encouraging people to do that.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI and Uploading Documentation

2015-05-15 Thread Donald Stufft

 On May 15, 2015, at 2:23 PM, Erik Bray erik.m.b...@gmail.com wrote:
 
 On Fri, May 15, 2015 at 9:48 AM, Donald Stufft don...@stufft.io wrote:
 Hey!
 
 First, for anyone who isn't aware we recently migrated PyPI and TestPyPI so
 that instead of storing files and documentation locally (really in a 
 glusterfs
 cluster) it will store them inside of S3. This will reduce maintenance 
 overhead
 of running PyPI by two servers since we'll no longer need to run our own
 glusterfs cluster as well as improve the reliaiblity and scalability of the
 PyPI service as a whole since we've had nothing but problems from glusterfs 
 in
 this regard.
 
 One of the things that this brought to light was that the documentation
 upload ability in PyPI is something that is not often used* however it
 represents something which is one of our slowest routes. It's not a well
 supported feature and I feel that it's going outside of the core competancy 
 for
 PyPI itself and instead PyPI should be focused on the files themselves. In
 addition since the time this was added to PyPI a number of free services or
 cheap services have came about that allow people to sanely upload raw 
 document
 without a reliance on any particular documentation system and we've also had
 the rise of ReadTheDocs for when someone is using Sphinx as their 
 documentation
 system.
 
 I think that it's time to retire this aspect of PyPI which has never been 
 well
 supported and instead focus on just the things that are core to PyPI. I don't
 have a fully concrete proposal for doing this, but I wanted to reach out here
 and figure out if anyone had any ideas. The rough idea I have currently is to
 simply disable new documentation uploads and add two new small features. One
 will allow users to delete their existing documentation from PyPI and the 
 other
 would allow them to register a redirect which would take them from the 
 current
 location to wherever they move their documentation too. In order to prevent
 breaking documentation for projects which are defunct or not actively
 maintained we would maintain the archived documentation (sans what anyone has
 deleted) indefinetely.
 
 Ideally I hope people start to use ReadTheDocs instead of PyPI itself. I 
 think
 that ReadTheDocs is a great service with heavy ties to the Python community.
 They will do a better job at hosting documentation than PyPI ever could since
 that is their core goal. In addition there is a dialog between ReadTheDocs 
 and
 PyPI where there is an opportunity to add integration between the two sites 
 as
 well as features to ReadTheDocs that it currently lacks that people feel are 
 a
 requirement before we move PyPI's documentation to read-only.
 
 Thoughts?
 
 * Out of ~60k projects only ~2.8k have ever uploaded documentation. It's not
  easy to tell if all of them are still using it as their primary source of
  documentation though or if it's old documentation that they just can't
  delete.
 
 +1 for all the stated reasons.
 
 I have a few docs hosted on pythonhosted.org, but it's become a
 nuisance to maintain since it does not support multiple doc versions
 like ReadTheDocs, so now I've wound up with documentation for the same
 projects on both sites.  The nuisance comes not so much in the process
 (like Barry wrote, I've enjoyed the simplicity of `setup.py
 upload_docs`), but because more often than not I've had to redirect
 users to the Readthedocs docs to make sure they're using the correct
 version of the docs.  So I wish I were not locked into updating the
 pythonhosted.org docs and would be happy to retire them altogether
 (much as I appreciated the service).
 
 One question is how this would be handled at the tooling end.
 setup.py upload_docs would have to be retired somehow.  Though it
 might also be nice if some simple tools were added to make it just as
 easy to add docs to ReadTheDocs.  I know something like upload_docs
 doesn't really make sense, since RTD handles the checkout and build of
 the docs.  But there's still a manual step of enabling new versions of
 the docs that it would be nice to make as effortless as `setup.py
 upload_docs`.  I gues that's off-topic for the PyPI end of things
 though.
 
 Erik


So I can’t speak for ReadTheDocs, but I believe that they are considering
and/or are planning on offering arbitrary HTML uploads similarly to how
you can upload documentation to PyPI. I don’t know if this will actually
happen and what it would look like but I know they are thinking about it.

As far as retiring upload_docs, the sanest thing to do I think would be
to just have PyPI return an error code that upload_docs would display
to the end user. The command itself is in use by a few other systems I think
so we might not want to remove it wholesale from Python itself (or maybe
we do? It’s a hard question since it’s tied to an external service unlike
most of the stdlib).

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc

Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Justin Cappos
One thing to consider is that if conflicts do not exist (or are very rare),
the number of possible combinations is a moot point.  A greedy algorithm
for installation (which just chooses the most favored package to resolve
each dependency) will run in linear time with the number of packages it
would install, if no conflicts exist.

So, what you are saying about state exploration may be true for a resolver
that uses something like a SAT solver, but doesn't apply to backtracking
dependency resolution (unless a huge number of conflicts occur) or simple
dependency resolution (at all).  SAT solvers do have heuristics to avoid
this blow up, except in pathological cases.  However, simple / backtracking
dependency resolution systems have the further advantage of not needing to
request unneeded metadata in the first place...

Thanks,
Justin

On Fri, May 15, 2015 at 2:57 PM, Robert Collins robe...@robertcollins.net
wrote:

 So, I am working on pip issue 988: pip doesn't resolve packages at all.

 This is O(packages^alternatives_per_package): if you are resolving 10
 packages with 10 versions each, there are approximately 10^10 or 10G
 combinations. 10 packages with 100 versions each - 10^100.

 So - its going to depend pretty heavily on some good heuristics in
 whatever final algorithm makes its way in, but the problem is
 exacerbated by PyPI's nature.

 Most Linux (all that i'm aware of) distributions have at most 5
 versions of a package to consider at any time - installed(might be
 None), current release, current release security updates, new release
 being upgraded to, new release being upgraded to's security updates.
 And their common worst case is actually 2 versions: installed==current
 release and one new release present. They map alternatives out into
 separate packages (e.g. when an older soname is deliberately kept
 across an ABI incompatibility, you end up with 2 packages, not 2
 versions of one package). To when comparing pip's challenge to apt's:
 apt has ~20-30K packages, with altnernatives ~= 2, or
 pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)

 Scaling the number of packages is relatively easy; scaling the number
 of alternatives is harder. Even 300 packages (the dependency tree for
 openstack) is ~2.4T combinations to probe.

 I wonder if it makes sense to give some back-pressure to people, or at
 the very least encourage them to remove distributions that:
  - they don't support anymore
  - have security holes

 If folk consider PyPI a sort of historical archive then perhaps we
 could have a feature to select 'supported' versions by the author, and
 allow a query parameter to ask for all the versions.

 -Rob

 --
 Robert Collins rbtcoll...@hp.com
 Distinguished Technologist
 HP Converged Cloud
 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 https://mail.python.org/mailman/listinfo/distutils-sig

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 06:57, Robert Collins robe...@robertcollins.net wrote:
 So, I am working on pip issue 988: pip doesn't resolve packages at all.

 This is O(packages^alternatives_per_package): if you are resolving 10
...
 Scaling the number of packages is relatively easy; scaling the number
 of alternatives is harder. Even 300 packages (the dependency tree for
 openstack) is ~2.4T combinations to probe.

I added a check for the exact number (when the current step limit is hit):
Hit step limit during resolving,
22493640689038530013767184665222125808455708963348534886974974630893524036813561125576881299950281714638872640331745747555743820280235291929928862660035516365300612827387994788286647556890876840654454905860390366740480.00
from 4038 versions in 205 packages after 10 steps

Which indicates a alternatives factor of ~20. And AIUI PyPI has a long
tail itself, so its more common that folk will see  5.7 factors,
rather than less common.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Ben Finney
Donald Stufft don...@stufft.io writes:

  On May 15, 2015, at 2:57 PM, Robert Collins robe...@robertcollins.net 
  wrote:
  
  If folk consider PyPI a sort of historical archive then perhaps we
  could have a feature to select 'supported' versions by the author,
  and allow a query parameter to ask for all the versions.
  

 It’s common for deployments people have to use a requirements.txt file
 like ``foo==1.0`` and to just continue to pull from PyPI. Deleting the
 old files breaks anyone doing that, so it would require either having
 people bundle their deps in their repositories or some way to get at
 those old versions. Personally I think that we shouldn’t go deleting
 the old versions or encouraging people to do that.

Yes, it's common to consider PyPI as a repository of all versions ever
released, and to treat it as an archive whose URLs will continue to make
available the historical versions.

-- 
 \ “If history and science have taught us anything, it is that |
  `\ passion and desire are not the same as truth.” —E. O. Wilson, |
_o__)  _Consilience_, 1998 |
Ben Finney

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 07:19, Donald Stufft don...@stufft.io wrote:

 There have been a handful of projects which would only keep the latest N
 versions uploaded to PyPI. I know this primarily because it has caused
 people a decent amount of pain over time. It’s common for deployments people
 have to use a requirements.txt file like ``foo==1.0`` and to just continue
 to pull from PyPI. Deleting the old files breaks anyone doing that, so it 
 would
 require either having people bundle their deps in their repositories or
 some way to get at those old versions. Personally I think that we shouldn’t
 go deleting the old versions or encouraging people to do that.

I think 'most recent only' is too much. Most upstreams will support
more than one release. Like - I don't care what testtools release you
use.

OTOH, every version with distinct dependencies becomes a very
expensive liability to the ecosystem here. It's beyond human scale,
and well in the territory of argh wtf the universe is burning around
me and my tardis has run out of power.

I'm sure we can provide an escape hatch in pip (and I'm going to do
that in my branch soon - offering simple 'error on conflict' and 'use
first seen specifier only' strategies) while folk work on different
heuristics - the actual resolver is only ~100 LOC in my branch today -
the rest is refactoring (that can be made better and I plan to do so
before suggesting we merge it).

But a significant contributing factor is the O of the problem, and we
can do something about that. I don't know what exactly, and I think
we're going to need to have our creative caps firmly on to come up
with something meeting the broad needs of the ecosystem: which
includes pip Just Working.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 08:46, Justin Cappos jcap...@nyu.edu wrote:
 Example: say I have an ecosystem of 10 packages. A-J. And they do a
 release every 6 months that is guaranteed to work together, but every
 time some issue occurs which ends up clamping the group together- e.g.
 an external release breaks API and so A1s deps are disjoint with A2s,
 and then the same between A2 and A3. Even though A1's API is
 compatible with B2's: its not internal bad code, its just taking *one*
 external dep breaking its API.

 After 2 releases you have 10^2 combinations, but only 4 are valid at
 all. Thats 4%. 8 releases gets you 10^8, 8 valid combinations, or
 0.008%.


 Yes, so this would not be a situation where conflicts do not exist (or are
 very rare) as my post mentioned.  Is this rate of conflicts something you
 measured or is it a value you made up?

It's drawn from the concrete example of OpenStack, which has a single
group of co-installable releases that cluster together every 6 months.
I don't have the actual valid/invalid ratio there because I don't have
enough machines to calculate it:).

-Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig