At 08:12 PM 4/22/2006 -0400, Terry Reedy wrote: >If my premises above are mistaken, then the suggestions should be modified >or discarded. However, I don't see how they conflict at all with a >consumer rating system.
My point was simply that providing rapid, visible feedback to authors results in a shorter feedback loop with less infrastructure. Also, after thinking it over, it's clear that the spidering is never going to be able to come out entirely, because there are lots of valid use cases for people effectively setting up their own mini-indexes. All that will happen is that at some point I'll be able to stop adding heuristics. (Hopefully that point is already past, in fact.) For anybody that wants to know how the current heuristics work, EasyInstall actually only has a few main categories of heuristics used to find packages: * Ones that apply to PyPI * Ones that apply to SourceForge * Ones that interpret distutils-generated filenames * The one that detects when a page is really a Subversion directory, and thus should be checked out instead of downloaded Most of the SourceForge heuristics have been eliminated already, except for the translation of prdownloads.sf.net URLs to dl.sourceforge.net URLs, and automatic retries in the event of a broken mirror. I'm about to begin modifying the PyPI heuristics to use the new XML-RPC interface instead, for the most part. (Although finding links in a package's long description will still be easier via the web interface.) And the distutils haven't started generating any new kinds of filenames lately, although I occasionally run into situations where non-distutils links or obscure corner cases of distutils filenames give problems, or where somebody has filenames that look like they came from the distutils, but the contents aren't a valid distutils source distribution. Anyway, these are the only things that are truly heuristic in the sense that they are attempts to guess well, and there is always the potential for failure or obsolescence if PyPI or SourceForge or Subversion changes, or people do strange things with their links. I should probably also point out that calling this "spidering" may give the impression it's more sophisticated than it is. EasyInstall only retrieves pages that it is explicitly given, or which appear in one of two specific parts of a PyPI listing. But it *scans* links on any page that it retrieves, and if a link looks like a downloadable package, it will parse as much info as practical from the filename in order to catalog it as a possible download source. So, it will read HTML from PyPI pages, pages directly linked from PyPI as either "Home" or "Download" URLs, and page URLs you give to --find-links. But it doesn't "spider" anywhere besides those pages, unless you count downloading an actual package link. The whole process more resembles a few quick redirects in a browser, than it does any sort of traditional web spider. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com