Re: [Distutils] reproducible builds

2017-03-17 Thread David Wilson
Hey Robin,

> What happens if other distros decide not to use this environment variable?
> Do I really want distro specific code in the package?

AFAIK this is seeing a great deal of use outside of Debian and even
Linux, for instance GCC also supports this variable.


> In short where does the distro responsibility and package maintainers
> boundary need to be?

I guess it mostly comes down to whether you'd like them to carry the
debt of a vendor patch to implement the behaviour for you in a way you
don't like, or you'd prefer to retain full control. :)  So it's more a
preference than a responsibility.


David
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] The mypy package

2016-04-18 Thread David Wilson
On Mon, Apr 18, 2016 at 09:34:09AM -0700, Chris Barker wrote:

> Namespaces seem like a great idea, then these problems disappear
> entirely,

> huh? as far as I can tell, namespaces greatly expand the pool of available
> names, but other than that, we've got the same problem.

They seem to have worked well enough from the 1980s through to the 3.5bn
or so Internet users we have today.


David
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] The mypy package

2016-04-18 Thread David Wilson
On Mon, Apr 18, 2016 at 08:18:37AM -0700, Chris Barker - NOAA Federal wrote:

> We really should have SOME way to determine if a PyPi name has been
> abandoned. Or even be proactive--PyPi names must be maintained in SOME
> way, perhaps:

+1


> Respond to some sort of "do you still want this" email. At least once a year.

+0.. this is along the right track but still seems too invasive for what
is mostly an edge case.

I'm interested in this conversation as I have two package names
registered on PyPy for unreleased projects, one with months of work
spanning years put into it (but not yet fit for release) and another
with actual years put into it.

I'd be disappointed to lack the ability to prevent either name being
annexed for someone's weekend project, although life would continue just
fine if this were to occur. :)


> Details aside, as PyPi continues to grow, we really need a way to
> clear out the abandoned stuff -- the barrier to entry for creating a
> new name on PyPi is just too low.

Namespaces seem like a great idea, then these problems disappear
entirely, e.g. have the server consult a one-time-generated list of
aliases should a package name be requested that is not prefixed with an
alias, and insist any new registrations include one.


David
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Where should I put tests when packaging python modules?

2015-10-09 Thread David Wilson
On Tue, Oct 06, 2015 at 09:51:01AM +0200, Antoine Pitrou wrote:

> They should be inside the module. That way, you can check an installed
> module is ok by running e.g. "python -m mypackage.tests". Any other
> choice makes testing installed modules more cumbersome.

As Donald mentioned, this doesn't work in the general case since many
packages ship quite substantial test data around that often doesn't end
up installed, and in other cases since the package requires significant
fixture setup or external resources (e.g. running SQLAlchemy tests
without a working database server would be meaningless).

The option of always shipping test data as a standard part of a package
in a vein attempt to always ensure it can be tested (which is not always
likely given the SQLAlchemy example above) strikes me as incredibly
wasteful, not from some oh-precious-bytes standpoint, but from the
perspective of distributing a Python application of any size where the
effects of always shipping half-configured test suites has increased the
resulting distribution size potentially by 3 or 4x.

https://github.com/bennoleslie/pexif is the first hit on Google for a
module I thought would need some test data. It's actually quite
minimally tested, yet already the tests + data are 3.6x the size of the
module itself.


I appreciate arguments for inlining tests alongside a package in order
to allow reuse of the suite's functionality by consuming applications'
test suites, but as above, in the general case this simply isn't
something that will always work and can be relied on by default.


Is there perhaps a third option that was absent from the original post?
e.g. organizing tests in a separate, optional, potentially
pip-installable package.


> 
> > outside the module like this:
> > 
> >  https://github.com/pypa/sampleproject/tree/master/tests
> 
> There is no actual reason to do that except win a couple kilobytes if
> you are distributing your package on floppy disks for consumption on
> Z80-based machines with 64KB RAM.
> 
> Even Python *itself* puts its test suite inside the standard library,
> not outside it (though some Linux distros may strip it away).
> Try "python -m test.regrtest" (again, this may fail if your distro
> decided to ship the test suite in a separate package).
> 
> The PyP"A" should definitely fix its sample project to reflect good
> practices.
> 
> Regards
> 
> Antoine.
> 
> 
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] [Python-ideas] PyPI search still broken

2015-09-10 Thread David Wilson
On Thu, Sep 10, 2015 at 09:31:13AM -0400, Donald Stufft wrote:

> The old PostgreSQL based system has been gone for awhile, and we
> already have ElasticSearch with a small cron job that runs every 3
> hours to index the data.

That's awesome news. :)


David
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] does pypi or red-dove have a better firehose API than download all the packages?

2013-05-16 Thread David Wilson
Would something like http://pypi.h1.botanicus.net/static/dump.txt.gz be
useful to you? (warning: 57mb expanding to 540mb). Each line is a
JSON-encoded dict containing a single package release.

for line in gzip.open('dump.txt.gz'):
dct = json.loads(line)


etc

The code for it is very simple, would be willing to clean it up and turn it
into a cron job if people found it useful.

Note the dump above is outdated, I only made it as a test.




On 15 May 2013 21:12, Daniel Holth dho...@gmail.com wrote:

 Yeah, I've been using the run bandersnatch API, but the local
 storage requirement is a bit hefty.

 On Wed, May 15, 2013 at 4:11 PM, Donald Stufft don...@stufft.io wrote:
  Nvm missed the one web request requirement. No I don't think so.
 
  On May 15, 2013, at 4:07 PM, Daniel Holth dho...@gmail.com wrote:
 
  Is there an API for all the metadata for everything that doesn't
  require one web request per package version? Maybe something like an
  rdiff-backup of a database?
  ___
  Distutils-SIG maillist  -  Distutils-SIG@python.org
  http://mail.python.org/mailman/listinfo/distutils-sig
 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 http://mail.python.org/mailman/listinfo/distutils-sig

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] does pypi or red-dove have a better firehose API than download all the packages?

2013-05-16 Thread David Wilson
Interesting! I produced that dump as part of a demo of using Xapian for
cheese shop search (still a work in progress, when I get a free moment).
Adding e.g. a  depends: operator is something I'd like, and your database
sounds very useful for achieving that goal.

Thanks for the link. I may be e-mailing you shortly ;)


On 17 May 2013 02:50, Daniel Holth dho...@gmail.com wrote:

 On Thu, May 16, 2013 at 3:46 PM, David Wilson d...@botanicus.net wrote:
  Would something like http://pypi.h1.botanicus.net/static/dump.txt.gz be
  useful to you? (warning: 57mb expanding to 540mb). Each line is a
  JSON-encoded dict containing a single package release.
 
  for line in gzip.open('dump.txt.gz'):
  dct = json.loads(line)
  
 
  etc
 
  The code for it is very simple, would be willing to clean it up and turn
 it
  into a cron job if people found it useful.
 
  Note the dump above is outdated, I only made it as a test.

 Seems like a useful format.

 https://bitbucket.org/dholth/pypi_stats is a prototype that parses
 requires.txt and other metadata out of all the sdists in a folder,
 putting them into a sqlite3 database. It may be interesting for
 experimentation. For example, I can easily tell you how many different
 version numbers there are and which are the most popular, or I can
 tell you which metadata keys and version numbers have been used. The
 database winds up being 1.6 GB or about 200MB if you delete the
 unparsed files.

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fixing Cheese Shop search

2013-05-14 Thread David Wilson
Quasi-monthly ping :)

I'm still happy to volunteer help.


David


On 23 April 2013 13:41, Nick Coghlan ncogh...@gmail.com wrote:
 On Mon, Apr 22, 2013 at 11:30 PM, David Wilson d...@botanicus.net wrote:
 Prototype code is here: https://bitbucket.org/dmw/pypi-search (note:
 relies on a pre-alpha quality DB library I'm hacking on)

 Thoughts?

 Hi David,

 This certainly sounds intriguing (and even promising), but I believe
 Richard (Jones, the main PyPI maintainer) has just returned from a
 long business trip, so it may be a while before he catches back up
 with distutils-sig.

 I'll see if I can nudge some other folks and get you a better answer...

 Cheers,
 Nick.

 --
 Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Fixing Cheese Shop search

2013-04-22 Thread David Wilson
http://pypi.h1.botanicus.net/ is the same demo running behind Apache
with mod_gzip on a Core i7 920.

On 22 April 2013 02:11, David Wilson d...@botanicus.net wrote:
 Hi there,

 In a fit of madness caused by another 30 seconds-long PyPI search I
 decided to investigate the code, in the hopes of perhaps finding
 something simple that would alleviate the extremely long search time.

 I discovered what appears to be a function that makes 6 SQL queries
 for each term provided by the search user, in turn those queries
 expand to what appear to be %SUBSTRING% table scans across the
 releases table, which appears to contain upwards of half a gigabyte of
 strings.

 Now the root cause has been located, what to do about it? I looked at
 hacking on the code, but it seems webui.py is already too massive for
 its own good, and in any case PostgreSQL's options for efficient
 search are quite limited. It might be all-round good if the size of
 that module started to drop..

 I wrote a crawler to pull a reasonable facsimile of the releases table
 on to my machine via the XML-RPC API, then arranged for Xapian to
 index only the newest releases for each package. The resulting
 full-text index weighs in at a very reasonable 334mb, and searches
 complete almost immediately, even on my lowly Intel Atom colocated
 server.

 I wrote a quick hack Flask app around it, which you can see here:

   http://5.39.91.176:5000/

 The indexer takes as input a database produced by the crawler, which
 is smart enough to know how to use PyPI's exposed 'changelog' serial
 numbers. Basically it is quite trivial and efficient to run this setup
 in an incremental indexing mode.

 As you can see from the results, even my lowly colo is trouncing what
 is currently on PyPI, and so my thoughts tend toward making an
 arrangement like this more permanent.

 The crawler code weighs in at 150 lines, the indexer a meagre 113
 lines, and the Flask example app is 74 lines. Implementing an exact
 replica of PyPI's existing scoring function is already partially done
 at indexing time, and the rest is quite easy to complete (mostly
 cutpasting code).

 Updating the Flask example to provide an XML-RPC API (or similar),
 then *initially augmenting* the old search facility seems like a good
 start, with a view to removing the old feature entirely. Integrating
 indexing directly would be pointless, the PyPI code really doesn't
 need anything more added to it until it gets at least reorganized a
 little.

 So for the cost of 334mb of disk, a cron job, and a lowly VPS with
 even just 700MB RAM, PyPI's search pains might be solved permanently.
 Naturally I'm writing this mail because it bothers me enough to
 volunteer help. :)

 Prototype code is here: https://bitbucket.org/dmw/pypi-search (note:
 relies on a pre-alpha quality DB library I'm hacking on)

 Thoughts?
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] Fixing Cheese Shop search

2013-04-21 Thread David Wilson
Hi there,

In a fit of madness caused by another 30 seconds-long PyPI search I
decided to investigate the code, in the hopes of perhaps finding
something simple that would alleviate the extremely long search time.

I discovered what appears to be a function that makes 6 SQL queries
for each term provided by the search user, in turn those queries
expand to what appear to be %SUBSTRING% table scans across the
releases table, which appears to contain upwards of half a gigabyte of
strings.

Now the root cause has been located, what to do about it? I looked at
hacking on the code, but it seems webui.py is already too massive for
its own good, and in any case PostgreSQL's options for efficient
search are quite limited. It might be all-round good if the size of
that module started to drop..

I wrote a crawler to pull a reasonable facsimile of the releases table
on to my machine via the XML-RPC API, then arranged for Xapian to
index only the newest releases for each package. The resulting
full-text index weighs in at a very reasonable 334mb, and searches
complete almost immediately, even on my lowly Intel Atom colocated
server.

I wrote a quick hack Flask app around it, which you can see here:

  http://5.39.91.176:5000/

The indexer takes as input a database produced by the crawler, which
is smart enough to know how to use PyPI's exposed 'changelog' serial
numbers. Basically it is quite trivial and efficient to run this setup
in an incremental indexing mode.

As you can see from the results, even my lowly colo is trouncing what
is currently on PyPI, and so my thoughts tend toward making an
arrangement like this more permanent.

The crawler code weighs in at 150 lines, the indexer a meagre 113
lines, and the Flask example app is 74 lines. Implementing an exact
replica of PyPI's existing scoring function is already partially done
at indexing time, and the rest is quite easy to complete (mostly
cutpasting code).

Updating the Flask example to provide an XML-RPC API (or similar),
then *initially augmenting* the old search facility seems like a good
start, with a view to removing the old feature entirely. Integrating
indexing directly would be pointless, the PyPI code really doesn't
need anything more added to it until it gets at least reorganized a
little.

So for the cost of 334mb of disk, a cron job, and a lowly VPS with
even just 700MB RAM, PyPI's search pains might be solved permanently.
Naturally I'm writing this mail because it bothers me enough to
volunteer help. :)

Prototype code is here: https://bitbucket.org/dmw/pypi-search (note:
relies on a pre-alpha quality DB library I'm hacking on)

Thoughts?
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig