Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)
On 14 May 2015 at 19:01, Chris Barker chris.bar...@noaa.gov wrote: Ah -- here is the issue -- but I think we HAVE pretty much got what we need here -- at least for Windows and OS-X. It depends what you mean by curated, but it seems we have a (defacto?) policy for PyPi: binary wheels should be compatible with the python.org builds. So while each package wheel is supplied by the package maintainer one way or another, rather than by a central entity, it is more or less curated -- or at least standardized. And if you are going to put a binary wheel up, you need to make sure it matches -- and that is less than trivial for packages that require a third party dependency -- but building the lib statically and then linking it in is not inherently easier than doing a dynamic link. I think the issue is that, if we have 5 different packages that depend on (say) libpng, and we're using dynamic builds, then how do those packages declare that they need access to libpng.dll? And on Windows, where does the user put libpng.dll so that it gets picked up? And how does a non-expert user do this (put it in $DIRECTORY, update your PATH, blah blah blah doesn't work for the average user)? In particular, on Windows, note that the shared DLL must either be in the directory where the executable is located (which is fun when you have virtualenvs, embedded interpreters, etc), or on PATH (which has other implications - suppose I have an incompatible version of libpng.dll, from mingw, say, somewhere earlier on PATH). The problem isn't so much defining a standard ABI that shared DLLs need - as you say, that's a more or less solved problem on Windows - it's managing how those shared DLLs are made available to Python extensions. And *that* is what Unix package managers do for you, and Windows doesn't have a good solution for (other than bundle all the dependent DLLs with the app, or suffer DLL hell). Paul PS For a fun exercise, it might be interesting to try breaking conda - find a Python extension which uses a shared DLL, and check that it works. Then grab an incompatible copy of that DLL (say a 32-bit version on a 64-bit system) and try hacking around with PATH, putting the incompatible DLL in a directory earlier on PATH than the correct one, in the Windows directory, use an embedded interpreter like mod_wsgi, tricks like that. If conda survives that, then the solution that they use might be something worth documenting and might offer an approach to solving the issue I described above. If it *doesn't* survive, then that probably implies that the general environment pip has to work in is less forgiving than the curated environment conda manages (which is, of course, the whole point of using conda - to get that curated environment :-)) ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
[Distutils] PyPI and Uploading Documentation
Hey! First, for anyone who isn't aware we recently migrated PyPI and TestPyPI so that instead of storing files and documentation locally (really in a glusterfs cluster) it will store them inside of S3. This will reduce maintenance overhead of running PyPI by two servers since we'll no longer need to run our own glusterfs cluster as well as improve the reliaiblity and scalability of the PyPI service as a whole since we've had nothing but problems from glusterfs in this regard. One of the things that this brought to light was that the documentation upload ability in PyPI is something that is not often used* however it represents something which is one of our slowest routes. It's not a well supported feature and I feel that it's going outside of the core competancy for PyPI itself and instead PyPI should be focused on the files themselves. In addition since the time this was added to PyPI a number of free services or cheap services have came about that allow people to sanely upload raw document without a reliance on any particular documentation system and we've also had the rise of ReadTheDocs for when someone is using Sphinx as their documentation system. I think that it's time to retire this aspect of PyPI which has never been well supported and instead focus on just the things that are core to PyPI. I don't have a fully concrete proposal for doing this, but I wanted to reach out here and figure out if anyone had any ideas. The rough idea I have currently is to simply disable new documentation uploads and add two new small features. One will allow users to delete their existing documentation from PyPI and the other would allow them to register a redirect which would take them from the current location to wherever they move their documentation too. In order to prevent breaking documentation for projects which are defunct or not actively maintained we would maintain the archived documentation (sans what anyone has deleted) indefinetely. Ideally I hope people start to use ReadTheDocs instead of PyPI itself. I think that ReadTheDocs is a great service with heavy ties to the Python community. They will do a better job at hosting documentation than PyPI ever could since that is their core goal. In addition there is a dialog between ReadTheDocs and PyPI where there is an opportunity to add integration between the two sites as well as features to ReadTheDocs that it currently lacks that people feel are a requirement before we move PyPI's documentation to read-only. Thoughts? * Out of ~60k projects only ~2.8k have ever uploaded documentation. It's not easy to tell if all of them are still using it as their primary source of documentation though or if it's old documentation that they just can't delete. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA signature.asc Description: Message signed with OpenPGP using GPGMail ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI and Uploading Documentation
On Fri, May 15, 2015 at 9:48 AM, Donald Stufft don...@stufft.io wrote: Hey! First, for anyone who isn't aware we recently migrated PyPI and TestPyPI so that instead of storing files and documentation locally (really in a glusterfs cluster) it will store them inside of S3. This will reduce maintenance overhead of running PyPI by two servers since we'll no longer need to run our own glusterfs cluster as well as improve the reliaiblity and scalability of the PyPI service as a whole since we've had nothing but problems from glusterfs in this regard. One of the things that this brought to light was that the documentation upload ability in PyPI is something that is not often used* however it represents something which is one of our slowest routes. It's not a well supported feature and I feel that it's going outside of the core competancy for PyPI itself and instead PyPI should be focused on the files themselves. In addition since the time this was added to PyPI a number of free services or cheap services have came about that allow people to sanely upload raw document without a reliance on any particular documentation system and we've also had the rise of ReadTheDocs for when someone is using Sphinx as their documentation system. I think that it's time to retire this aspect of PyPI which has never been well supported and instead focus on just the things that are core to PyPI. I don't have a fully concrete proposal for doing this, but I wanted to reach out here and figure out if anyone had any ideas. The rough idea I have currently is to simply disable new documentation uploads and add two new small features. One will allow users to delete their existing documentation from PyPI and the other would allow them to register a redirect which would take them from the current location to wherever they move their documentation too. In order to prevent breaking documentation for projects which are defunct or not actively maintained we would maintain the archived documentation (sans what anyone has deleted) indefinetely. Ideally I hope people start to use ReadTheDocs instead of PyPI itself. I think that ReadTheDocs is a great service with heavy ties to the Python community. They will do a better job at hosting documentation than PyPI ever could since that is their core goal. In addition there is a dialog between ReadTheDocs and PyPI where there is an opportunity to add integration between the two sites as well as features to ReadTheDocs that it currently lacks that people feel are a requirement before we move PyPI's documentation to read-only. Thoughts? +1 * Out of ~60k projects only ~2.8k have ever uploaded documentation. It's not easy to tell if all of them are still using it as their primary source of documentation though or if it's old documentation that they just can't delete. I know I have documentation for at least one project hosted this way. I don't remember how I set that up. :) I assume there will be some way to notify owners of effected documentation. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI and Uploading Documentation
On Fri, May 15, 2015 at 8:48 AM, Donald Stufft don...@stufft.io wrote: Hey! First, for anyone who isn't aware we recently migrated PyPI and TestPyPI so that instead of storing files and documentation locally (really in a glusterfs cluster) it will store them inside of S3. This will reduce maintenance overhead of running PyPI by two servers since we'll no longer need to run our own glusterfs cluster as well as improve the reliaiblity and scalability of the PyPI service as a whole since we've had nothing but problems from glusterfs in this regard. One of the things that this brought to light was that the documentation upload ability in PyPI is something that is not often used* however it represents something which is one of our slowest routes. It's not a well supported feature and I feel that it's going outside of the core competancy for PyPI itself and instead PyPI should be focused on the files themselves. In addition since the time this was added to PyPI a number of free services or cheap services have came about that allow people to sanely upload raw document without a reliance on any particular documentation system and we've also had the rise of ReadTheDocs for when someone is using Sphinx as their documentation system. I think that it's time to retire this aspect of PyPI which has never been well supported and instead focus on just the things that are core to PyPI. I don't have a fully concrete proposal for doing this, but I wanted to reach out here and figure out if anyone had any ideas. The rough idea I have currently is to simply disable new documentation uploads and add two new small features. One will allow users to delete their existing documentation from PyPI and the other would allow them to register a redirect which would take them from the current location to wherever they move their documentation too. In order to prevent breaking documentation for projects which are defunct or not actively maintained we would maintain the archived documentation (sans what anyone has deleted) indefinetely. Ideally I hope people start to use ReadTheDocs instead of PyPI itself. I think that ReadTheDocs is a great service with heavy ties to the Python community. They will do a better job at hosting documentation than PyPI ever could since that is their core goal. In addition there is a dialog between ReadTheDocs and PyPI where there is an opportunity to add integration between the two sites as well as features to ReadTheDocs that it currently lacks that people feel are a requirement before we move PyPI's documentation to read-only. Thoughts? * Out of ~60k projects only ~2.8k have ever uploaded documentation. It's not easy to tell if all of them are still using it as their primary source of documentation though or if it's old documentation that they just can't delete. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig I'm +1 on reducing the responsibilities of PyPI so it can act as an index/repository in a much more efficient manner. I'm also +1 on recommending people use ReadTheDocs. It supports more than just Sphinx so it's a rather flexible option. It's also open source, which means that anyone can contribute to it. I'm curious to hear more about integrations between PyPI and ReadTheDocs but I fully understand if they're not concrete enough to be worthy of discussion. -- Ian ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI and Uploading Documentation
On May 15, 2015, at 09:48 AM, Donald Stufft wrote: One of the things that this brought to light was that the documentation upload ability in PyPI is something that is not often used* however it represents something which is one of our slowest routes. I use it for all my packages, mostly because it's easy for my upload workflow: `python setup.py upload_docs`. That said, with the rise of RTD, I have wondered about the usefulness of pythonhosted documentation. And because twine supports secure uploads of code, but not documentation, that unease has grown. So even while I use it, I agree it's time to consider retiring the service. One thing I definitely want to retain though is the link to Package Documentation from the project's PyPI page. Please do give us a way to specify that link. The PSF is a supporter of RTD, but let's all make sure they stay in business! https://readthedocs.org/sustainability/#about Cheers, -Barry pgpE3C9e_rzyR.pgp Description: OpenPGP digital signature ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI is a sick sick hoarder
Why not start with pip at least being a simple fail-on-conflict resolver (vs the 1st found wins resolver it is now)... You'd backtrack for the sake of re-walking when new constraints are found, but not for the purpose of solving conflicts. I know you're motivated to solve Openstack build issues, but many of the issues I've seen in the pip tracker, I think would be solved without the backtracking resolver you're trying to build. On Fri, May 15, 2015 at 11:57 AM, Robert Collins robe...@robertcollins.net wrote: So, I am working on pip issue 988: pip doesn't resolve packages at all. This is O(packages^alternatives_per_package): if you are resolving 10 packages with 10 versions each, there are approximately 10^10 or 10G combinations. 10 packages with 100 versions each - 10^100. So - its going to depend pretty heavily on some good heuristics in whatever final algorithm makes its way in, but the problem is exacerbated by PyPI's nature. Most Linux (all that i'm aware of) distributions have at most 5 versions of a package to consider at any time - installed(might be None), current release, current release security updates, new release being upgraded to, new release being upgraded to's security updates. And their common worst case is actually 2 versions: installed==current release and one new release present. They map alternatives out into separate packages (e.g. when an older soname is deliberately kept across an ABI incompatibility, you end up with 2 packages, not 2 versions of one package). To when comparing pip's challenge to apt's: apt has ~20-30K packages, with altnernatives ~= 2, or pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft) Scaling the number of packages is relatively easy; scaling the number of alternatives is harder. Even 300 packages (the dependency tree for openstack) is ~2.4T combinations to probe. I wonder if it makes sense to give some back-pressure to people, or at the very least encourage them to remove distributions that: - they don't support anymore - have security holes If folk consider PyPI a sort of historical archive then perhaps we could have a feature to select 'supported' versions by the author, and allow a query parameter to ask for all the versions. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI and Uploading Documentation
I'm using pypi's documentation hosting for pysdl2-cffi because I thought it would be too difficult to run the documentation generator (which parses documentation comments out of the wrapped C code) on the readthedocs server. Perhaps there is a different way to do it that I'm not familiar with. On Fri, May 15, 2015 at 9:55 AM, Ian Cordasco graffatcolmin...@gmail.com wrote: On Fri, May 15, 2015 at 8:48 AM, Donald Stufft don...@stufft.io wrote: Hey! First, for anyone who isn't aware we recently migrated PyPI and TestPyPI so that instead of storing files and documentation locally (really in a glusterfs cluster) it will store them inside of S3. This will reduce maintenance overhead of running PyPI by two servers since we'll no longer need to run our own glusterfs cluster as well as improve the reliaiblity and scalability of the PyPI service as a whole since we've had nothing but problems from glusterfs in this regard. One of the things that this brought to light was that the documentation upload ability in PyPI is something that is not often used* however it represents something which is one of our slowest routes. It's not a well supported feature and I feel that it's going outside of the core competancy for PyPI itself and instead PyPI should be focused on the files themselves. In addition since the time this was added to PyPI a number of free services or cheap services have came about that allow people to sanely upload raw document without a reliance on any particular documentation system and we've also had the rise of ReadTheDocs for when someone is using Sphinx as their documentation system. I think that it's time to retire this aspect of PyPI which has never been well supported and instead focus on just the things that are core to PyPI. I don't have a fully concrete proposal for doing this, but I wanted to reach out here and figure out if anyone had any ideas. The rough idea I have currently is to simply disable new documentation uploads and add two new small features. One will allow users to delete their existing documentation from PyPI and the other would allow them to register a redirect which would take them from the current location to wherever they move their documentation too. In order to prevent breaking documentation for projects which are defunct or not actively maintained we would maintain the archived documentation (sans what anyone has deleted) indefinetely. Ideally I hope people start to use ReadTheDocs instead of PyPI itself. I think that ReadTheDocs is a great service with heavy ties to the Python community. They will do a better job at hosting documentation than PyPI ever could since that is their core goal. In addition there is a dialog between ReadTheDocs and PyPI where there is an opportunity to add integration between the two sites as well as features to ReadTheDocs that it currently lacks that people feel are a requirement before we move PyPI's documentation to read-only. Thoughts? * Out of ~60k projects only ~2.8k have ever uploaded documentation. It's not easy to tell if all of them are still using it as their primary source of documentation though or if it's old documentation that they just can't delete. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig I'm +1 on reducing the responsibilities of PyPI so it can act as an index/repository in a much more efficient manner. I'm also +1 on recommending people use ReadTheDocs. It supports more than just Sphinx so it's a rather flexible option. It's also open source, which means that anyone can contribute to it. I'm curious to hear more about integrations between PyPI and ReadTheDocs but I fully understand if they're not concrete enough to be worthy of discussion. -- Ian ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI is a sick sick hoarder
On 16 May 2015 at 11:08, Marcus Smith qwc...@gmail.com wrote: Why not start with pip at least being a simple fail-on-conflict resolver (vs the 1st found wins resolver it is now)... You'd backtrack for the sake of re-walking when new constraints are found, but not for the purpose of solving conflicts. I know you're motivated to solve Openstack build issues, but many of the issues I've seen in the pip tracker, I think would be solved without the backtracking resolver you're trying to build. Well, I'm scratching the itch I have. If its too hard to get something decent, sure I might back off in my goals, but I see no point aiming for something less than all the other language specific packaging systems out there have. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI is a sick sick hoarder
On May 15, 2015, at 9:22 PM, Robert Collins robe...@robertcollins.net wrote: On 16 May 2015 at 11:08, Marcus Smith qwc...@gmail.com wrote: Why not start with pip at least being a simple fail-on-conflict resolver (vs the 1st found wins resolver it is now)... You'd backtrack for the sake of re-walking when new constraints are found, but not for the purpose of solving conflicts. I know you're motivated to solve Openstack build issues, but many of the issues I've seen in the pip tracker, I think would be solved without the backtracking resolver you're trying to build. Well, I'm scratching the itch I have. If its too hard to get something decent, sure I might back off in my goals, but I see no point aiming for something less than all the other language specific packaging systems out there have. So what makes the other language specific packaging systems different? As far as I know all of them have complete archives (e.g. they are like PyPI where they have a lot of versions, not like Linux Distros). What can we learn from how they solved this? --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA signature.asc Description: Message signed with OpenPGP using GPGMail ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI is a sick sick hoarder
On 16 May 2015 at 13:45, Donald Stufft don...@stufft.io wrote: On May 15, 2015, at 9:22 PM, Robert Collins robe...@robertcollins.net wrote: On 16 May 2015 at 11:08, Marcus Smith qwc...@gmail.com wrote: Why not start with pip at least being a simple fail-on-conflict resolver (vs the 1st found wins resolver it is now)... You'd backtrack for the sake of re-walking when new constraints are found, but not for the purpose of solving conflicts. I know you're motivated to solve Openstack build issues, but many of the issues I've seen in the pip tracker, I think would be solved without the backtracking resolver you're trying to build. Well, I'm scratching the itch I have. If its too hard to get something decent, sure I might back off in my goals, but I see no point aiming for something less than all the other language specific packaging systems out there have. So what makes the other language specific packaging systems different? As far as I know all of them have complete archives (e.g. they are like PyPI where they have a lot of versions, not like Linux Distros). What can we learn from how they solved this? NB; I have by no means finished low hanging heuristics and space trimming stuff :). I have some simple things in mind and am sure I'll end up with something 'good enough' for day to day use. The thing I'm worried about is the long term health of the approach. Good questions. Some of it is structural I suspect. A quick rundown. cabal (haskell) has a backtracking solver that accepts various parameters to tell it to try harder. javascript effectively vendors every dep ever, so you end up with many copies of the same library at different versions in the same process. rust's cargo system currently solves everything in a single project only - it has no binary packaging, only vendor-into-a-binary-build packaging. The gem behaviour I'm not yet familiar with. perl I used to know but time has eroded it :/. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI is a sick sick hoarder
On Sat, May 16, 2015 at 10:52 AM, Robert Collins robe...@robertcollins.net wrote: On 16 May 2015 at 13:45, Donald Stufft don...@stufft.io wrote: On May 15, 2015, at 9:22 PM, Robert Collins robe...@robertcollins.net wrote: On 16 May 2015 at 11:08, Marcus Smith qwc...@gmail.com wrote: Why not start with pip at least being a simple fail-on-conflict resolver (vs the 1st found wins resolver it is now)... You'd backtrack for the sake of re-walking when new constraints are found, but not for the purpose of solving conflicts. I know you're motivated to solve Openstack build issues, but many of the issues I've seen in the pip tracker, I think would be solved without the backtracking resolver you're trying to build. Well, I'm scratching the itch I have. If its too hard to get something decent, sure I might back off in my goals, but I see no point aiming for something less than all the other language specific packaging systems out there have. So what makes the other language specific packaging systems different? As far as I know all of them have complete archives (e.g. they are like PyPI where they have a lot of versions, not like Linux Distros). What can we learn from how they solved this? NB; I have by no means finished low hanging heuristics and space trimming stuff :). I have some simple things in mind and am sure I'll end up with something 'good enough' for day to day use. The thing I'm worried about is the long term health of the approach. Good questions. Some of it is structural I suspect. A quick rundown. cabal (haskell) has a backtracking solver that accepts various parameters to tell it to try harder. javascript effectively vendors every dep ever, so you end up with many copies of the same library at different versions in the same process. rust's cargo system currently solves everything in a single project only - it has no binary packaging, only vendor-into-a-binary-build packaging. The gem behaviour I'm not yet familiar with. perl I used to know but time has eroded it :/. FWIW, php uses a SAT-based solver in composer, which started as a port of libsolv (the SAT solver used by openSUSE and soon Fedora). I am no expert, but I don't understand why backtracking algorithms would to be faster than SAT, since they both potentially need to walk over the full set of possible solutions. It is hard to reason about the cost because the worst case is in theory growing exponentially in both cases. With a SAT-based algorithm for dependency resolution, it is relatively simple to apply heuristics which massively prune the search space. For example, when considering package A with say 10 potential versions A_1, etc..., in theory, you need to generate the rules: # - means not install, + means install - A_1 | - A_2 - A_1 | - A_3 ... and those constitute most of the rules in common cases. But it is possible to tweak the SAT implementation to replace those rules by a single AtMost one of rule per *package*, which means the #rules do not grow much by versions. The real difficulty of SAT-based solver is the optimization part: many actually valid solutions are not acceptable, and that's where the heuristics get more complicated. David ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)
On Fri, May 15, 2015 at 1:44 PM, Paul Moore p.f.mo...@gmail.com wrote: Is there any point? or is the current approach of statically linking all third party libs the way to go? If someone can make it work, that would be good. But (a) nobody is actually offering to develop and maintain such a solution, well, it's on my list -- but it has been for a while, so I'm trying to gauge whether it's worth putting at the top of my things to do for python list. It's not at the top now ;-) and (b) it's not particularly clear how *much* of a benefit there would be (space savings aren't that important, ease of upgrade is fine as long as everything can be upgraded at once, etc...) hmm -- that may be a trick, though not a uncommon one in python package dependencies -- it maybe hard to have more than one version of a given lib installed If so, then is there any chance of getting folks to conform to this standard for PyPi hosted binary packages anyway? i.e. the curation problem. If it exists, and if there's a benefit, people will use it. OK -- that's encouraging... Personally, I'm on the fence here -- I really want newbies to be able to simply pip install as many packages as possible and get a good result when they do it. Static linking gives that on Windows FWIW. (And maybe also on OSX?) This is a key point, though - the goal shouldn't be use dynamic linking but rather make the user experience as easy as possible. It may even be that the best approach (dynamic or static) differs depending on platform. true -- though we also have another problem -- that static linking solution is actually a big pain for package maintainers -- building and linking the dependencies the right way is a pain -- and now everyone that uses a given lib has to figure out how to do it. Giving folks a dynamic lib they can use would mie it easier for tehm to build their packages -- a nice benifit there. Though it's a lot harder to provide a build environment than just the lib to link too .. Im going to have to think more about that... On the other hand, I've found that conda better supports this right now, so it's easier for me to simply use that for my tools. And that's an entirely reasonable position. The only problem (if indeed it is a problem) is that by having two different solutions (pip/wheel and conda) splits the developer resource, which means that neither approach moves forward as fast as a combined approach does. That's not the only problem -- the current split between the (more than one) scientifc python distributions, and the community of folks using python.org and pypi creates a bit of a mess for newbies. I'm reviving this conversation because i just spent a class lecture in a python class on numpy/scipy -- these students have been using a python install for months, using virtualenv, ip installing whatever they need, et. and now, to use another lib, they have to go through machination, maybe even installing a entire additional python. This is not good. And I've had to help more than one student untangle a mess of Apple Python python.org python, homebrew, and/or Anaconda -- for someone that doesn't really get python pacakging, never mond PATHS, and .bashrc vs .bash_profile, etc, it's an unholy mess. There should be one-- and preferably only one --obvious way to do it. -- HA! But that's OK if the two solutions are addressing different needs The needs aren't really that different, however. Oh well. Anyway, it seems like if I can find some time to prototype what I have in mind, there may be some room to make it official if it works out. If anyone else want to help -- let me know! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI is a sick sick hoarder
On May 15, 2015, at 9:19 PM, Donald Stufft don...@stufft.io wrote: On May 15, 2015, at 2:57 PM, Robert Collins robe...@robertcollins.net wrote: So, I am working on pip issue 988: pip doesn't resolve packages at all. This is O(packages^alternatives_per_package): if you are resolving 10 packages with 10 versions each, there are approximately 10^10 or 10G combinations. 10 packages with 100 versions each - 10^100. So - its going to depend pretty heavily on some good heuristics in whatever final algorithm makes its way in, but the problem is exacerbated by PyPI's nature. Most Linux (all that i'm aware of) distributions have at most 5 versions of a package to consider at any time - installed(might be None), current release, current release security updates, new release being upgraded to, new release being upgraded to's security updates. And their common worst case is actually 2 versions: installed==current release and one new release present. They map alternatives out into separate packages (e.g. when an older soname is deliberately kept across an ABI incompatibility, you end up with 2 packages, not 2 versions of one package). To when comparing pip's challenge to apt's: apt has ~20-30K packages, with altnernatives ~= 2, or pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft) Scaling the number of packages is relatively easy; scaling the number of alternatives is harder. Even 300 packages (the dependency tree for openstack) is ~2.4T combinations to probe. I wonder if it makes sense to give some back-pressure to people, or at the very least encourage them to remove distributions that: - they don't support anymore - have security holes If folk consider PyPI a sort of historical archive then perhaps we could have a feature to select 'supported' versions by the author, and allow a query parameter to ask for all the versions. There have been a handful of projects which would only keep the latest N versions uploaded to PyPI. I know this primarily because it has caused people a decent amount of pain over time. It’s common for deployments people have to use a requirements.txt file like ``foo==1.0`` and to just continue to pull from PyPI. Deleting the old files breaks anyone doing that, so it would require either having people bundle their deps in their repositories or some way to get at those old versions. Personally I think that we shouldn’t go deleting the old versions or encouraging people to do that. +1 for this. While I appreciate why Linux distress purge old versions, it is absolutely hellish for reproducibility. If you are looking for prior art, check out the Molinillo project (https://github.com/CocoaPods/Molinillo) used by Bundler and CocoaPods. It is not as complex as the Solve gem used in Chef but offers a good balance of performance in satisfying constraints and false negatives on solution failures. --Noah signature.asc Description: Message signed with OpenPGP using GPGMail ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI and Uploading Documentation
On Fri, May 15, 2015 at 9:48 AM, Donald Stufft don...@stufft.io wrote: Hey! First, for anyone who isn't aware we recently migrated PyPI and TestPyPI so that instead of storing files and documentation locally (really in a glusterfs cluster) it will store them inside of S3. This will reduce maintenance overhead of running PyPI by two servers since we'll no longer need to run our own glusterfs cluster as well as improve the reliaiblity and scalability of the PyPI service as a whole since we've had nothing but problems from glusterfs in this regard. One of the things that this brought to light was that the documentation upload ability in PyPI is something that is not often used* however it represents something which is one of our slowest routes. It's not a well supported feature and I feel that it's going outside of the core competancy for PyPI itself and instead PyPI should be focused on the files themselves. In addition since the time this was added to PyPI a number of free services or cheap services have came about that allow people to sanely upload raw document without a reliance on any particular documentation system and we've also had the rise of ReadTheDocs for when someone is using Sphinx as their documentation system. I think that it's time to retire this aspect of PyPI which has never been well supported and instead focus on just the things that are core to PyPI. I don't have a fully concrete proposal for doing this, but I wanted to reach out here and figure out if anyone had any ideas. The rough idea I have currently is to simply disable new documentation uploads and add two new small features. One will allow users to delete their existing documentation from PyPI and the other would allow them to register a redirect which would take them from the current location to wherever they move their documentation too. In order to prevent breaking documentation for projects which are defunct or not actively maintained we would maintain the archived documentation (sans what anyone has deleted) indefinetely. Ideally I hope people start to use ReadTheDocs instead of PyPI itself. I think that ReadTheDocs is a great service with heavy ties to the Python community. They will do a better job at hosting documentation than PyPI ever could since that is their core goal. In addition there is a dialog between ReadTheDocs and PyPI where there is an opportunity to add integration between the two sites as well as features to ReadTheDocs that it currently lacks that people feel are a requirement before we move PyPI's documentation to read-only. Thoughts? * Out of ~60k projects only ~2.8k have ever uploaded documentation. It's not easy to tell if all of them are still using it as their primary source of documentation though or if it's old documentation that they just can't delete. +1 for all the stated reasons. I have a few docs hosted on pythonhosted.org, but it's become a nuisance to maintain since it does not support multiple doc versions like ReadTheDocs, so now I've wound up with documentation for the same projects on both sites. The nuisance comes not so much in the process (like Barry wrote, I've enjoyed the simplicity of `setup.py upload_docs`), but because more often than not I've had to redirect users to the Readthedocs docs to make sure they're using the correct version of the docs. So I wish I were not locked into updating the pythonhosted.org docs and would be happy to retire them altogether (much as I appreciated the service). One question is how this would be handled at the tooling end. setup.py upload_docs would have to be retired somehow. Though it might also be nice if some simple tools were added to make it just as easy to add docs to ReadTheDocs. I know something like upload_docs doesn't really make sense, since RTD handles the checkout and build of the docs. But there's still a manual step of enabling new versions of the docs that it would be nice to make as effortless as `setup.py upload_docs`. I gues that's off-topic for the PyPI end of things though. Erik ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI is a sick sick hoarder
On 16 May 2015 at 07:18, Justin Cappos jcap...@nyu.edu wrote: One thing to consider is that if conflicts do not exist (or are very rare), the number of possible combinations is a moot point. A greedy algorithm for installation (which just chooses the most favored package to resolve each dependency) will run in linear time with the number of packages it would install, if no conflicts exist. So, what you are saying about state exploration may be true for a resolver that uses something like a SAT solver, but doesn't apply to backtracking dependency resolution (unless a huge number of conflicts occur) or simple dependency resolution (at all). SAT solvers do have heuristics to avoid this blow up, except in pathological cases. However, simple / backtracking dependency resolution systems have the further advantage of not needing to request unneeded metadata in the first place... Your intuition here is misleading, sorry :(. You're right about 'hey if everything fits its linear', but the reason we have this bug open, with people adding a new example of it every week or so (as someone who didn't realise that pip doesn't resolve finds out and adds it to pip's issue tracker, only to have it made into a dupe). Backtracking recursive resolvers have exactly the same O as SAT. Example: say I have an ecosystem of 10 packages. A-J. And they do a release every 6 months that is guaranteed to work together, but every time some issue occurs which ends up clamping the group together- e.g. an external release breaks API and so A1s deps are disjoint with A2s, and then the same between A2 and A3. Even though A1's API is compatible with B2's: its not internal bad code, its just taking *one* external dep breaking its API. After 2 releases you have 10^2 combinations, but only 4 are valid at all. Thats 4%. 8 releases gets you 10^8, 8 valid combinations, or 0.008%. Now there are two things to examine here. How likely is this to happen to PyPI users, and can a backtracker (which btw is what my code is) handle this better than a SAT solver. In terms of likelyhood - OpenStack hits this every release. Its not that our libraries are incompatible with each other, its that given 250 packages (the 200 in error I quoted just shows that the resolver hadn't obtained version data for everything), *something* breaks API compat in each 6 month release cycle, and so you end up with the whole set effectively locking together. In fact, it has happened so consistently to OpenStack that we now release our libraries with closed specifiers : =min_version, next_version. Secondly, backtrackers. Assume nothing is installed, and you want the latest release. Then sure, for a_version in A: for b_version in B: for c_version in C: etc will hit the most recent release first time, and you're golden. Assume you have the prior release installed, and you ran pip without -U, but something that pulls in the latest release of one lib (which then nabs everything). e.g. pip install J3.0 (and we have releases 1,2,3,4 of A through J). Now, for a_version in A: is going to have the installed version of A in its first step. B likewise. So we'll end up with a trace like: A==3 B==3 C==3 D==3 E==3 F==3 G==3 H==3 I==3 J==3 error, backtrack (user specifier) J==4 error, backtracks once the external deps are considered and the conflict is found J==2 error, backtrack (user specifier) J==1 error, backtrack (user specifier) I==4 J==3 error, backtrack (user specifier) J==4 error, backtracks somewere in the external deps J==2, error, backtrack (user specifier) and so on, until we finally tick over to A==4. More generally, any already installed version (without -U) can cause a backtracking resolver to try *all* possible other combinations before realising that that installed version is the problem and bumping it. A heuristic to look for those and bump them first then hits molasses as soon as one of the installed versions needs to be kept as-is. Anyhow, my goal here is to start the conversation; pip will need some knobs because no matter how good the heuristics users will need escape hatches. (One of which is to fully specify their needs). -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)
On Fri, May 15, 2015 at 1:49 AM, Paul Moore p.f.mo...@gmail.com wrote: On 14 May 2015 at 19:01, Chris Barker chris.bar...@noaa.gov wrote: Ah -- here is the issue -- but I think we HAVE pretty much got what we need here -- at least for Windows and OS-X. It depends what you mean by curated, but it seems we have a (defacto?) policy for PyPi: binary wheels should be compatible with the python.org builds. So while each package wheel is supplied by the package maintainer one way or another, rather than by a central entity, it is more or less curated -- or at least standardized. And if you are going to put a binary wheel up, you need to make sure it matches -- and that is less than trivial for packages that require a third party dependency -- but building the lib statically and then linking it in is not inherently easier than doing a dynamic link. I think the issue is that, if we have 5 different packages that depend on (say) libpng, and we're using dynamic builds, then how do those packages declare that they need access to libpng.dll? this is the missing link -- it is a binary build dependency, not a package dependency -- so not such much that matplotlib-1.4.3 depends on libpng.x.y, but that: matplotlib-1.4.3-cp27-none-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl depends on: libpng-x.y (all those binary parts will come from the platform) That's what's missing now. And on Windows, where does the user put libpng.dll so that it gets picked up? Well, here is the rub -- Windows dll hell really is hell -- but I think it goes into the python dll searchpath (sorry, not on a Windows box where I can really check this out right now), it can work -- I know have an in-house product that has multiple python modules sharing a single dll somehow And how does a non-expert user do this (put it in $DIRECTORY, update your PATH, blah blah blah doesn't work for the average user)? That's why we may need to update the tooling to handle this -- Im not totally sure if the current wheel format can support this on Windows -- though it can on OS-X. In particular, on Windows, note that the shared DLL must either be in the directory where the executable is located (which is fun when you have virtualenvs, embedded interpreters, etc), or on PATH (which has other implications - suppose I have an incompatible version of libpng.dll, from mingw, say, somewhere earlier on PATH). that would be dll hell, yes. The problem isn't so much defining a standard ABI that shared DLLs need - as you say, that's a more or less solved problem on Windows - it's managing how those shared DLLs are made available to Python extensions. And *that* is what Unix package managers do for you, and Windows doesn't have a good solution for (other than bundle all the dependent DLLs with the app, or suffer DLL hell). exactly -- but if we consider the python install to be the app, rather than an individual python bundle, then we _may_ be OK. PS For a fun exercise, it might be interesting to try breaking conda - Windows really is simply broken [1] in this regard -- so I'm quite sure you could break conda -- but it does seem to do a pretty good job of not being broken easily by common uses -- I can't say I know enough about Windows dll finding or conda to know how... Oh, and conda is actually broken in this regard on OS-X at this point -- if you compile your own extension in an anaconda environment, it will find a shared lib at compile time that it won't find at run time. -- the conda install process fixes these, but that's a pain when under development -- i.e. you don't want to have to actually install the package with conda to run a test each time you re-build the dll.. (or even change a bit of python code...) But in short -- I'm pretty sure there is a way, on all systems, to have a standard way to build extension modules, combined with a standard way to install shared libs, so that a lib can be shared among multiple packages. So the question remains: Is there any point? or is the current approach of statically linking all third party libs the way to go? If so, then is there any chance of getting folks to conform to this standard for PyPi hosted binary packages anyway? i.e. the curation problem. Personally, I'm on the fence here -- I really want newbies to be able to simply pip install as many packages as possible and get a good result when they do it. On the other hand, I've found that conda better supports this right now, so it's easier for me to simply use that for my tools. -Chris [1] My take on dll hell: a) it's inherently difficult -- which is why Linux provides a system package manager. b) however, Windows really does make it MORE difficult than it has to be: i) it looks first next the executable ii) it also looks on the PATH (rather than a separate DLL_PATH) Combine these two, and you have some folks dropping dlls next
Re: [Distutils] PyPI is a sick sick hoarder
On Fri, May 15, 2015 at 2:57 PM, Robert Collins robe...@robertcollins.net wrote: So, I am working on pip issue 988: pip doesn't resolve packages at all. This is O(packages^alternatives_per_package): if you are resolving 10 packages with 10 versions each, there are approximately 10^10 or 10G combinations. 10 packages with 100 versions each - 10^100. So - its going to depend pretty heavily on some good heuristics in whatever final algorithm makes its way in, but the problem is exacerbated by PyPI's nature. Most Linux (all that i'm aware of) distributions have at most 5 versions of a package to consider at any time - installed(might be None), current release, current release security updates, new release being upgraded to, new release being upgraded to's security updates. And their common worst case is actually 2 versions: installed==current release and one new release present. They map alternatives out into separate packages (e.g. when an older soname is deliberately kept across an ABI incompatibility, you end up with 2 packages, not 2 versions of one package). To when comparing pip's challenge to apt's: apt has ~20-30K packages, with altnernatives ~= 2, or pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft) Scaling the number of packages is relatively easy; scaling the number of alternatives is harder. Even 300 packages (the dependency tree for openstack) is ~2.4T combinations to probe. I wonder if it makes sense to give some back-pressure to people, or at the very least encourage them to remove distributions that: - they don't support anymore - have security holes If folk consider PyPI a sort of historical archive then perhaps we could have a feature to select 'supported' versions by the author, and allow a query parameter to ask for all the versions. You could simply limit the number of versions from PyPI you consider. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI is a sick sick hoarder
On 16 May 2015 at 08:27, Jim Fulton j...@zope.com wrote: If folk consider PyPI a sort of historical archive then perhaps we could have a feature to select 'supported' versions by the author, and allow a query parameter to ask for all the versions. You could simply limit the number of versions from PyPI you consider. Yes - it would be nice IMO to give package authors some influence over that though. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)
On 15 May 2015 at 20:56, Chris Barker chris.bar...@noaa.gov wrote: But in short -- I'm pretty sure there is a way, on all systems, to have a standard way to build extension modules, combined with a standard way to install shared libs, so that a lib can be shared among multiple packages. So the question remains: Is there any point? or is the current approach of statically linking all third party libs the way to go? If someone can make it work, that would be good. But (a) nobody is actually offering to develop and maintain such a solution, and (b) it's not particularly clear how *much* of a benefit there would be (space savings aren't that important, ease of upgrade is fine as long as everything can be upgraded at once, etc...) If so, then is there any chance of getting folks to conform to this standard for PyPi hosted binary packages anyway? i.e. the curation problem. If it exists, and if there's a benefit, people will use it. Personally, I'm on the fence here -- I really want newbies to be able to simply pip install as many packages as possible and get a good result when they do it. Static linking gives that on Windows FWIW. (And maybe also on OSX?) This is a key point, though - the goal shouldn't be use dynamic linking but rather make the user experience as easy as possible. It may even be that the best approach (dynamic or static) differs depending on platform. On the other hand, I've found that conda better supports this right now, so it's easier for me to simply use that for my tools. And that's an entirely reasonable position. The only problem (if indeed it is a problem) is that by having two different solutions (pip/wheel and conda) splits the developer resource, which means that neither approach moves forward as fast as a combined approach does. But that's OK if the two solutions are addressing different needs (which seems to be the case for the moment). Paul ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI is a sick sick hoarder
Example: say I have an ecosystem of 10 packages. A-J. And they do a release every 6 months that is guaranteed to work together, but every time some issue occurs which ends up clamping the group together- e.g. an external release breaks API and so A1s deps are disjoint with A2s, and then the same between A2 and A3. Even though A1's API is compatible with B2's: its not internal bad code, its just taking *one* external dep breaking its API. After 2 releases you have 10^2 combinations, but only 4 are valid at all. Thats 4%. 8 releases gets you 10^8, 8 valid combinations, or 0.008%. Yes, so this would not be a situation where conflicts do not exist (or are very rare) as my post mentioned. Is this rate of conflicts something you measured or is it a value you made up? I don't hear anyone arguing that the status quo makes sense. I think we're mostly just chatting about the right thing to optimize the solution for and what sorts of short cuts may be useful (or even necessary). Since we can measure the actual conflict and other values in practice, data seems like it may be a good path toward grounding the discussion... Thanks, Justin ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
[Distutils] PyPI is a sick sick hoarder
So, I am working on pip issue 988: pip doesn't resolve packages at all. This is O(packages^alternatives_per_package): if you are resolving 10 packages with 10 versions each, there are approximately 10^10 or 10G combinations. 10 packages with 100 versions each - 10^100. So - its going to depend pretty heavily on some good heuristics in whatever final algorithm makes its way in, but the problem is exacerbated by PyPI's nature. Most Linux (all that i'm aware of) distributions have at most 5 versions of a package to consider at any time - installed(might be None), current release, current release security updates, new release being upgraded to, new release being upgraded to's security updates. And their common worst case is actually 2 versions: installed==current release and one new release present. They map alternatives out into separate packages (e.g. when an older soname is deliberately kept across an ABI incompatibility, you end up with 2 packages, not 2 versions of one package). To when comparing pip's challenge to apt's: apt has ~20-30K packages, with altnernatives ~= 2, or pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft) Scaling the number of packages is relatively easy; scaling the number of alternatives is harder. Even 300 packages (the dependency tree for openstack) is ~2.4T combinations to probe. I wonder if it makes sense to give some back-pressure to people, or at the very least encourage them to remove distributions that: - they don't support anymore - have security holes If folk consider PyPI a sort of historical archive then perhaps we could have a feature to select 'supported' versions by the author, and allow a query parameter to ask for all the versions. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI and Uploading Documentation
Le 2015-05-15 14:34, Donald Stufft a écrit : As far as retiring upload_docs, the sanest thing to do I think would be to just have PyPI return an error code that upload_docs would display to the end user. The command itself is in use by a few other systems I think so we might not want to remove it wholesale from Python itself (or maybe we do? It’s a hard question since it’s tied to an external service unlike most of the stdlib). upload_docs is implemented by setuptools, not distutils. Cheers ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI is a sick sick hoarder
On May 15, 2015, at 2:57 PM, Robert Collins robe...@robertcollins.net wrote: So, I am working on pip issue 988: pip doesn't resolve packages at all. This is O(packages^alternatives_per_package): if you are resolving 10 packages with 10 versions each, there are approximately 10^10 or 10G combinations. 10 packages with 100 versions each - 10^100. So - its going to depend pretty heavily on some good heuristics in whatever final algorithm makes its way in, but the problem is exacerbated by PyPI's nature. Most Linux (all that i'm aware of) distributions have at most 5 versions of a package to consider at any time - installed(might be None), current release, current release security updates, new release being upgraded to, new release being upgraded to's security updates. And their common worst case is actually 2 versions: installed==current release and one new release present. They map alternatives out into separate packages (e.g. when an older soname is deliberately kept across an ABI incompatibility, you end up with 2 packages, not 2 versions of one package). To when comparing pip's challenge to apt's: apt has ~20-30K packages, with altnernatives ~= 2, or pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft) Scaling the number of packages is relatively easy; scaling the number of alternatives is harder. Even 300 packages (the dependency tree for openstack) is ~2.4T combinations to probe. I wonder if it makes sense to give some back-pressure to people, or at the very least encourage them to remove distributions that: - they don't support anymore - have security holes If folk consider PyPI a sort of historical archive then perhaps we could have a feature to select 'supported' versions by the author, and allow a query parameter to ask for all the versions. There have been a handful of projects which would only keep the latest N versions uploaded to PyPI. I know this primarily because it has caused people a decent amount of pain over time. It’s common for deployments people have to use a requirements.txt file like ``foo==1.0`` and to just continue to pull from PyPI. Deleting the old files breaks anyone doing that, so it would require either having people bundle their deps in their repositories or some way to get at those old versions. Personally I think that we shouldn’t go deleting the old versions or encouraging people to do that. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA signature.asc Description: Message signed with OpenPGP using GPGMail ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI and Uploading Documentation
On May 15, 2015, at 2:23 PM, Erik Bray erik.m.b...@gmail.com wrote: On Fri, May 15, 2015 at 9:48 AM, Donald Stufft don...@stufft.io wrote: Hey! First, for anyone who isn't aware we recently migrated PyPI and TestPyPI so that instead of storing files and documentation locally (really in a glusterfs cluster) it will store them inside of S3. This will reduce maintenance overhead of running PyPI by two servers since we'll no longer need to run our own glusterfs cluster as well as improve the reliaiblity and scalability of the PyPI service as a whole since we've had nothing but problems from glusterfs in this regard. One of the things that this brought to light was that the documentation upload ability in PyPI is something that is not often used* however it represents something which is one of our slowest routes. It's not a well supported feature and I feel that it's going outside of the core competancy for PyPI itself and instead PyPI should be focused on the files themselves. In addition since the time this was added to PyPI a number of free services or cheap services have came about that allow people to sanely upload raw document without a reliance on any particular documentation system and we've also had the rise of ReadTheDocs for when someone is using Sphinx as their documentation system. I think that it's time to retire this aspect of PyPI which has never been well supported and instead focus on just the things that are core to PyPI. I don't have a fully concrete proposal for doing this, but I wanted to reach out here and figure out if anyone had any ideas. The rough idea I have currently is to simply disable new documentation uploads and add two new small features. One will allow users to delete their existing documentation from PyPI and the other would allow them to register a redirect which would take them from the current location to wherever they move their documentation too. In order to prevent breaking documentation for projects which are defunct or not actively maintained we would maintain the archived documentation (sans what anyone has deleted) indefinetely. Ideally I hope people start to use ReadTheDocs instead of PyPI itself. I think that ReadTheDocs is a great service with heavy ties to the Python community. They will do a better job at hosting documentation than PyPI ever could since that is their core goal. In addition there is a dialog between ReadTheDocs and PyPI where there is an opportunity to add integration between the two sites as well as features to ReadTheDocs that it currently lacks that people feel are a requirement before we move PyPI's documentation to read-only. Thoughts? * Out of ~60k projects only ~2.8k have ever uploaded documentation. It's not easy to tell if all of them are still using it as their primary source of documentation though or if it's old documentation that they just can't delete. +1 for all the stated reasons. I have a few docs hosted on pythonhosted.org, but it's become a nuisance to maintain since it does not support multiple doc versions like ReadTheDocs, so now I've wound up with documentation for the same projects on both sites. The nuisance comes not so much in the process (like Barry wrote, I've enjoyed the simplicity of `setup.py upload_docs`), but because more often than not I've had to redirect users to the Readthedocs docs to make sure they're using the correct version of the docs. So I wish I were not locked into updating the pythonhosted.org docs and would be happy to retire them altogether (much as I appreciated the service). One question is how this would be handled at the tooling end. setup.py upload_docs would have to be retired somehow. Though it might also be nice if some simple tools were added to make it just as easy to add docs to ReadTheDocs. I know something like upload_docs doesn't really make sense, since RTD handles the checkout and build of the docs. But there's still a manual step of enabling new versions of the docs that it would be nice to make as effortless as `setup.py upload_docs`. I gues that's off-topic for the PyPI end of things though. Erik So I can’t speak for ReadTheDocs, but I believe that they are considering and/or are planning on offering arbitrary HTML uploads similarly to how you can upload documentation to PyPI. I don’t know if this will actually happen and what it would look like but I know they are thinking about it. As far as retiring upload_docs, the sanest thing to do I think would be to just have PyPI return an error code that upload_docs would display to the end user. The command itself is in use by a few other systems I think so we might not want to remove it wholesale from Python itself (or maybe we do? It’s a hard question since it’s tied to an external service unlike most of the stdlib). --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA signature.asc
Re: [Distutils] PyPI is a sick sick hoarder
One thing to consider is that if conflicts do not exist (or are very rare), the number of possible combinations is a moot point. A greedy algorithm for installation (which just chooses the most favored package to resolve each dependency) will run in linear time with the number of packages it would install, if no conflicts exist. So, what you are saying about state exploration may be true for a resolver that uses something like a SAT solver, but doesn't apply to backtracking dependency resolution (unless a huge number of conflicts occur) or simple dependency resolution (at all). SAT solvers do have heuristics to avoid this blow up, except in pathological cases. However, simple / backtracking dependency resolution systems have the further advantage of not needing to request unneeded metadata in the first place... Thanks, Justin On Fri, May 15, 2015 at 2:57 PM, Robert Collins robe...@robertcollins.net wrote: So, I am working on pip issue 988: pip doesn't resolve packages at all. This is O(packages^alternatives_per_package): if you are resolving 10 packages with 10 versions each, there are approximately 10^10 or 10G combinations. 10 packages with 100 versions each - 10^100. So - its going to depend pretty heavily on some good heuristics in whatever final algorithm makes its way in, but the problem is exacerbated by PyPI's nature. Most Linux (all that i'm aware of) distributions have at most 5 versions of a package to consider at any time - installed(might be None), current release, current release security updates, new release being upgraded to, new release being upgraded to's security updates. And their common worst case is actually 2 versions: installed==current release and one new release present. They map alternatives out into separate packages (e.g. when an older soname is deliberately kept across an ABI incompatibility, you end up with 2 packages, not 2 versions of one package). To when comparing pip's challenge to apt's: apt has ~20-30K packages, with altnernatives ~= 2, or pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft) Scaling the number of packages is relatively easy; scaling the number of alternatives is harder. Even 300 packages (the dependency tree for openstack) is ~2.4T combinations to probe. I wonder if it makes sense to give some back-pressure to people, or at the very least encourage them to remove distributions that: - they don't support anymore - have security holes If folk consider PyPI a sort of historical archive then perhaps we could have a feature to select 'supported' versions by the author, and allow a query parameter to ask for all the versions. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI is a sick sick hoarder
On 16 May 2015 at 06:57, Robert Collins robe...@robertcollins.net wrote: So, I am working on pip issue 988: pip doesn't resolve packages at all. This is O(packages^alternatives_per_package): if you are resolving 10 ... Scaling the number of packages is relatively easy; scaling the number of alternatives is harder. Even 300 packages (the dependency tree for openstack) is ~2.4T combinations to probe. I added a check for the exact number (when the current step limit is hit): Hit step limit during resolving, 22493640689038530013767184665222125808455708963348534886974974630893524036813561125576881299950281714638872640331745747555743820280235291929928862660035516365300612827387994788286647556890876840654454905860390366740480.00 from 4038 versions in 205 packages after 10 steps Which indicates a alternatives factor of ~20. And AIUI PyPI has a long tail itself, so its more common that folk will see 5.7 factors, rather than less common. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI is a sick sick hoarder
Donald Stufft don...@stufft.io writes: On May 15, 2015, at 2:57 PM, Robert Collins robe...@robertcollins.net wrote: If folk consider PyPI a sort of historical archive then perhaps we could have a feature to select 'supported' versions by the author, and allow a query parameter to ask for all the versions. It’s common for deployments people have to use a requirements.txt file like ``foo==1.0`` and to just continue to pull from PyPI. Deleting the old files breaks anyone doing that, so it would require either having people bundle their deps in their repositories or some way to get at those old versions. Personally I think that we shouldn’t go deleting the old versions or encouraging people to do that. Yes, it's common to consider PyPI as a repository of all versions ever released, and to treat it as an archive whose URLs will continue to make available the historical versions. -- \ “If history and science have taught us anything, it is that | `\ passion and desire are not the same as truth.” —E. O. Wilson, | _o__) _Consilience_, 1998 | Ben Finney ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI is a sick sick hoarder
On 16 May 2015 at 07:19, Donald Stufft don...@stufft.io wrote: There have been a handful of projects which would only keep the latest N versions uploaded to PyPI. I know this primarily because it has caused people a decent amount of pain over time. It’s common for deployments people have to use a requirements.txt file like ``foo==1.0`` and to just continue to pull from PyPI. Deleting the old files breaks anyone doing that, so it would require either having people bundle their deps in their repositories or some way to get at those old versions. Personally I think that we shouldn’t go deleting the old versions or encouraging people to do that. I think 'most recent only' is too much. Most upstreams will support more than one release. Like - I don't care what testtools release you use. OTOH, every version with distinct dependencies becomes a very expensive liability to the ecosystem here. It's beyond human scale, and well in the territory of argh wtf the universe is burning around me and my tardis has run out of power. I'm sure we can provide an escape hatch in pip (and I'm going to do that in my branch soon - offering simple 'error on conflict' and 'use first seen specifier only' strategies) while folk work on different heuristics - the actual resolver is only ~100 LOC in my branch today - the rest is refactoring (that can be made better and I plan to do so before suggesting we merge it). But a significant contributing factor is the O of the problem, and we can do something about that. I don't know what exactly, and I think we're going to need to have our creative caps firmly on to come up with something meeting the broad needs of the ecosystem: which includes pip Just Working. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PyPI is a sick sick hoarder
On 16 May 2015 at 08:46, Justin Cappos jcap...@nyu.edu wrote: Example: say I have an ecosystem of 10 packages. A-J. And they do a release every 6 months that is guaranteed to work together, but every time some issue occurs which ends up clamping the group together- e.g. an external release breaks API and so A1s deps are disjoint with A2s, and then the same between A2 and A3. Even though A1's API is compatible with B2's: its not internal bad code, its just taking *one* external dep breaking its API. After 2 releases you have 10^2 combinations, but only 4 are valid at all. Thats 4%. 8 releases gets you 10^8, 8 valid combinations, or 0.008%. Yes, so this would not be a situation where conflicts do not exist (or are very rare) as my post mentioned. Is this rate of conflicts something you measured or is it a value you made up? It's drawn from the concrete example of OpenStack, which has a single group of co-installable releases that cluster together every 6 months. I don't have the actual valid/invalid ratio there because I don't have enough machines to calculate it:). -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig