Re: [Distutils] reproducible builds
Hey Robin, > What happens if other distros decide not to use this environment variable? > Do I really want distro specific code in the package? AFAIK this is seeing a great deal of use outside of Debian and even Linux, for instance GCC also supports this variable. > In short where does the distro responsibility and package maintainers > boundary need to be? I guess it mostly comes down to whether you'd like them to carry the debt of a vendor patch to implement the behaviour for you in a way you don't like, or you'd prefer to retain full control. :) So it's more a preference than a responsibility. David ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] The mypy package
On Mon, Apr 18, 2016 at 09:34:09AM -0700, Chris Barker wrote: > Namespaces seem like a great idea, then these problems disappear > entirely, > huh? as far as I can tell, namespaces greatly expand the pool of available > names, but other than that, we've got the same problem. They seem to have worked well enough from the 1980s through to the 3.5bn or so Internet users we have today. David ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] The mypy package
On Mon, Apr 18, 2016 at 08:18:37AM -0700, Chris Barker - NOAA Federal wrote: > We really should have SOME way to determine if a PyPi name has been > abandoned. Or even be proactive--PyPi names must be maintained in SOME > way, perhaps: +1 > Respond to some sort of "do you still want this" email. At least once a year. +0.. this is along the right track but still seems too invasive for what is mostly an edge case. I'm interested in this conversation as I have two package names registered on PyPy for unreleased projects, one with months of work spanning years put into it (but not yet fit for release) and another with actual years put into it. I'd be disappointed to lack the ability to prevent either name being annexed for someone's weekend project, although life would continue just fine if this were to occur. :) > Details aside, as PyPi continues to grow, we really need a way to > clear out the abandoned stuff -- the barrier to entry for creating a > new name on PyPi is just too low. Namespaces seem like a great idea, then these problems disappear entirely, e.g. have the server consult a one-time-generated list of aliases should a package name be requested that is not prefixed with an alias, and insist any new registrations include one. David ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Where should I put tests when packaging python modules?
On Tue, Oct 06, 2015 at 09:51:01AM +0200, Antoine Pitrou wrote: > They should be inside the module. That way, you can check an installed > module is ok by running e.g. "python -m mypackage.tests". Any other > choice makes testing installed modules more cumbersome. As Donald mentioned, this doesn't work in the general case since many packages ship quite substantial test data around that often doesn't end up installed, and in other cases since the package requires significant fixture setup or external resources (e.g. running SQLAlchemy tests without a working database server would be meaningless). The option of always shipping test data as a standard part of a package in a vein attempt to always ensure it can be tested (which is not always likely given the SQLAlchemy example above) strikes me as incredibly wasteful, not from some oh-precious-bytes standpoint, but from the perspective of distributing a Python application of any size where the effects of always shipping half-configured test suites has increased the resulting distribution size potentially by 3 or 4x. https://github.com/bennoleslie/pexif is the first hit on Google for a module I thought would need some test data. It's actually quite minimally tested, yet already the tests + data are 3.6x the size of the module itself. I appreciate arguments for inlining tests alongside a package in order to allow reuse of the suite's functionality by consuming applications' test suites, but as above, in the general case this simply isn't something that will always work and can be relied on by default. Is there perhaps a third option that was absent from the original post? e.g. organizing tests in a separate, optional, potentially pip-installable package. > > > outside the module like this: > > > > https://github.com/pypa/sampleproject/tree/master/tests > > There is no actual reason to do that except win a couple kilobytes if > you are distributing your package on floppy disks for consumption on > Z80-based machines with 64KB RAM. > > Even Python *itself* puts its test suite inside the standard library, > not outside it (though some Linux distros may strip it away). > Try "python -m test.regrtest" (again, this may fail if your distro > decided to ship the test suite in a separate package). > > The PyP"A" should definitely fix its sample project to reflect good > practices. > > Regards > > Antoine. > > > ___ > Distutils-SIG maillist - Distutils-SIG@python.org > https://mail.python.org/mailman/listinfo/distutils-sig ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] [Python-ideas] PyPI search still broken
On Thu, Sep 10, 2015 at 09:31:13AM -0400, Donald Stufft wrote: > The old PostgreSQL based system has been gone for awhile, and we > already have ElasticSearch with a small cron job that runs every 3 > hours to index the data. That's awesome news. :) David ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] does pypi or red-dove have a better firehose API than download all the packages?
Would something like http://pypi.h1.botanicus.net/static/dump.txt.gz be useful to you? (warning: 57mb expanding to 540mb). Each line is a JSON-encoded dict containing a single package release. for line in gzip.open('dump.txt.gz'): dct = json.loads(line) etc The code for it is very simple, would be willing to clean it up and turn it into a cron job if people found it useful. Note the dump above is outdated, I only made it as a test. On 15 May 2013 21:12, Daniel Holth dho...@gmail.com wrote: Yeah, I've been using the run bandersnatch API, but the local storage requirement is a bit hefty. On Wed, May 15, 2013 at 4:11 PM, Donald Stufft don...@stufft.io wrote: Nvm missed the one web request requirement. No I don't think so. On May 15, 2013, at 4:07 PM, Daniel Holth dho...@gmail.com wrote: Is there an API for all the metadata for everything that doesn't require one web request per package version? Maybe something like an rdiff-backup of a database? ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] does pypi or red-dove have a better firehose API than download all the packages?
Interesting! I produced that dump as part of a demo of using Xapian for cheese shop search (still a work in progress, when I get a free moment). Adding e.g. a depends: operator is something I'd like, and your database sounds very useful for achieving that goal. Thanks for the link. I may be e-mailing you shortly ;) On 17 May 2013 02:50, Daniel Holth dho...@gmail.com wrote: On Thu, May 16, 2013 at 3:46 PM, David Wilson d...@botanicus.net wrote: Would something like http://pypi.h1.botanicus.net/static/dump.txt.gz be useful to you? (warning: 57mb expanding to 540mb). Each line is a JSON-encoded dict containing a single package release. for line in gzip.open('dump.txt.gz'): dct = json.loads(line) etc The code for it is very simple, would be willing to clean it up and turn it into a cron job if people found it useful. Note the dump above is outdated, I only made it as a test. Seems like a useful format. https://bitbucket.org/dholth/pypi_stats is a prototype that parses requires.txt and other metadata out of all the sdists in a folder, putting them into a sqlite3 database. It may be interesting for experimentation. For example, I can easily tell you how many different version numbers there are and which are the most popular, or I can tell you which metadata keys and version numbers have been used. The database winds up being 1.6 GB or about 200MB if you delete the unparsed files. ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fixing Cheese Shop search
Quasi-monthly ping :) I'm still happy to volunteer help. David On 23 April 2013 13:41, Nick Coghlan ncogh...@gmail.com wrote: On Mon, Apr 22, 2013 at 11:30 PM, David Wilson d...@botanicus.net wrote: Prototype code is here: https://bitbucket.org/dmw/pypi-search (note: relies on a pre-alpha quality DB library I'm hacking on) Thoughts? Hi David, This certainly sounds intriguing (and even promising), but I believe Richard (Jones, the main PyPI maintainer) has just returned from a long business trip, so it may be a while before he catches back up with distutils-sig. I'll see if I can nudge some other folks and get you a better answer... Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Fixing Cheese Shop search
http://pypi.h1.botanicus.net/ is the same demo running behind Apache with mod_gzip on a Core i7 920. On 22 April 2013 02:11, David Wilson d...@botanicus.net wrote: Hi there, In a fit of madness caused by another 30 seconds-long PyPI search I decided to investigate the code, in the hopes of perhaps finding something simple that would alleviate the extremely long search time. I discovered what appears to be a function that makes 6 SQL queries for each term provided by the search user, in turn those queries expand to what appear to be %SUBSTRING% table scans across the releases table, which appears to contain upwards of half a gigabyte of strings. Now the root cause has been located, what to do about it? I looked at hacking on the code, but it seems webui.py is already too massive for its own good, and in any case PostgreSQL's options for efficient search are quite limited. It might be all-round good if the size of that module started to drop.. I wrote a crawler to pull a reasonable facsimile of the releases table on to my machine via the XML-RPC API, then arranged for Xapian to index only the newest releases for each package. The resulting full-text index weighs in at a very reasonable 334mb, and searches complete almost immediately, even on my lowly Intel Atom colocated server. I wrote a quick hack Flask app around it, which you can see here: http://5.39.91.176:5000/ The indexer takes as input a database produced by the crawler, which is smart enough to know how to use PyPI's exposed 'changelog' serial numbers. Basically it is quite trivial and efficient to run this setup in an incremental indexing mode. As you can see from the results, even my lowly colo is trouncing what is currently on PyPI, and so my thoughts tend toward making an arrangement like this more permanent. The crawler code weighs in at 150 lines, the indexer a meagre 113 lines, and the Flask example app is 74 lines. Implementing an exact replica of PyPI's existing scoring function is already partially done at indexing time, and the rest is quite easy to complete (mostly cutpasting code). Updating the Flask example to provide an XML-RPC API (or similar), then *initially augmenting* the old search facility seems like a good start, with a view to removing the old feature entirely. Integrating indexing directly would be pointless, the PyPI code really doesn't need anything more added to it until it gets at least reorganized a little. So for the cost of 334mb of disk, a cron job, and a lowly VPS with even just 700MB RAM, PyPI's search pains might be solved permanently. Naturally I'm writing this mail because it bothers me enough to volunteer help. :) Prototype code is here: https://bitbucket.org/dmw/pypi-search (note: relies on a pre-alpha quality DB library I'm hacking on) Thoughts? ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
[Distutils] Fixing Cheese Shop search
Hi there, In a fit of madness caused by another 30 seconds-long PyPI search I decided to investigate the code, in the hopes of perhaps finding something simple that would alleviate the extremely long search time. I discovered what appears to be a function that makes 6 SQL queries for each term provided by the search user, in turn those queries expand to what appear to be %SUBSTRING% table scans across the releases table, which appears to contain upwards of half a gigabyte of strings. Now the root cause has been located, what to do about it? I looked at hacking on the code, but it seems webui.py is already too massive for its own good, and in any case PostgreSQL's options for efficient search are quite limited. It might be all-round good if the size of that module started to drop.. I wrote a crawler to pull a reasonable facsimile of the releases table on to my machine via the XML-RPC API, then arranged for Xapian to index only the newest releases for each package. The resulting full-text index weighs in at a very reasonable 334mb, and searches complete almost immediately, even on my lowly Intel Atom colocated server. I wrote a quick hack Flask app around it, which you can see here: http://5.39.91.176:5000/ The indexer takes as input a database produced by the crawler, which is smart enough to know how to use PyPI's exposed 'changelog' serial numbers. Basically it is quite trivial and efficient to run this setup in an incremental indexing mode. As you can see from the results, even my lowly colo is trouncing what is currently on PyPI, and so my thoughts tend toward making an arrangement like this more permanent. The crawler code weighs in at 150 lines, the indexer a meagre 113 lines, and the Flask example app is 74 lines. Implementing an exact replica of PyPI's existing scoring function is already partially done at indexing time, and the rest is quite easy to complete (mostly cutpasting code). Updating the Flask example to provide an XML-RPC API (or similar), then *initially augmenting* the old search facility seems like a good start, with a view to removing the old feature entirely. Integrating indexing directly would be pointless, the PyPI code really doesn't need anything more added to it until it gets at least reorganized a little. So for the cost of 334mb of disk, a cron job, and a lowly VPS with even just 700MB RAM, PyPI's search pains might be solved permanently. Naturally I'm writing this mail because it bothers me enough to volunteer help. :) Prototype code is here: https://bitbucket.org/dmw/pypi-search (note: relies on a pre-alpha quality DB library I'm hacking on) Thoughts? ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig