Re: [Distutils] Mystery solved

2006-07-14 Thread Phillip J. Eby
At 03:56 PM 7/11/2006 -0400, Jim Fulton wrote:
On Jul 11, 2006, at 2:07 PM, Phillip J. Eby wrote:
At 11:50 AM 7/11/2006 -0400, Jim Fulton wrote:
I would stop when a result is found.

Even so, this means O(N x M) web hits, where N is the number of
packages and M is the number of --find-links (including dependency
links supplied by eggs installed so far).  I don't think it's
reasonable to hit so many non-existent URLs on non-index servers,
and is impolite to the servers' operators.  (For example, if they
receive a daily report of all 404 errors from their web servers, as
I do.  This is pretty common on Red Hat boxes using logwatch, for
example.)

It's particularly unfair since using e.g. http:// 
peak.telecommunity.com/snapshots/ as a --find-links while
installing, say TurboGears, would cause a whole host of index
hits to subdirectories of that URL, even though none of them can or
will be found.

The fallout from this approach is far worse than any screen
scraping issues we've had.

Isn't this the approach that's followed now?

No; only the --find-links pages themselves are read, and one assumes that 
they actually exist.  :)


   Aren't all of the find- links searched as well as the index?  I suppose 
 you're referring to
the search for /projectname, which potentially doubles the number of
requests.

Doubling is only the beginning.  If there are 5 dependencies, or 5 
requirements on the command line, then it quintuples the number of 
requests, and they're all going to be retrieving non-existent URLs, except 
for whichever link was actually the package index.

Of course, this is also ignoring the UI reason why the index URL and 
find-links URLs are specified separately, and that is that the common case 
is to use PyPI and maybe also a find-link or two.  If they were specified 
by the same option, then any use of find-links would require you to retype 
the index URL.  So, it's not a very convenient UI to merge the concepts, as 
well as being neither efficient for retrieval speed nor polite to site 
operators.

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Mystery solved

2006-07-11 Thread Jim Fulton

On Jul 11, 2006, at 11:25 AM, Phillip J. Eby wrote:

 At 03:54 AM 7/11/2006 -0400, Jim Fulton wrote:
 On Jul 10, 2006, at 6:22 PM, Phillip J. Eby wrote:
 Here's the problem.  Reducing everything to info messages means
 there's effectively no control over output detail.  I generally  
 use
 'warn()' for things that *may* reflect an error in input
 parameters.  So, my take on the above is that although the
 Scanning message could become an info(), the previous one
 shouldn't.

 This means that one always has to use an index.  In which case,  
 what
 is the point of find-links?

 The point of --find-links is to provide links to unindexed packages.

 But if you use an unindexed package, you'll get a warning.

 Which means it *may* be an error.  See above.  If it was an error,  
 it would be an error.  The point of a warning is to inform you that  
 something *may* be an error.

 IMO, you should not get a warning for correct use of software.  Users
 should try to make warnings go away.

 Repeating it doesn't make it so.  I'm not convinced that this  
 particular warning (that you may have misspelled the package name  
 because it's not in the index you're using) is in any way harmful.

It is definitely so.  That is definitely my opinion. :)

OK that's an interesting point wrt possible misspellings. If you can  
find the package via the find links, but not via the index, that  
seems to me to be a pretty good indication that this is not a  
misspelling.  This is the case I'm worried about.  If the package  
can't be found anywhere, then I agree that a warning is warranted.



   If you give people warnings that
 they shouldn't make go away,

That wasn't clear.  If people are using the software correctly, but  
choosing to find distributions via find-links rather than an index,  
and they get warnings, then they will always get warnings and tend to  
ignore them.


 Huh?  They can put the package in the index, use a different index,  
 not use -U, specify an exact download URL (either directly or via -- 
 find-links), etc.  There are a huge number of ways *not* to  
 encounter that particular warning.

I have to use -U to get newer versions of distributions, even if I  
happen to store distributions in a directory that is not a valid  
index.  In this case, I use find-links and -U together, and I'll get  
a warning unless I put distributions in an index.

 Either you can't have valid unindexed software, or setuptools  
 shouldn't
 generate a warning if software isn't in the index.

 This is back to argument by assertion.  Please explain to me why  
 this warning is actually bad, rather than simply asserting that  
 it's so.

I assert and take as a premise that when users are using software  
correctly, including not misspelling anything, they should not get a  
warning.  If you can't buy that, then we have an unreconcilable  
difference.

The specific case, which I'll repeat from above, as clearly as I can,  
is this:

- A user chooses not to store their software in an index.
- The user places distributions on a web server somewhere.  This is  
just a directory, it is not a valid index.
- The user points at their server using find-links
- The user has an installation and they want to check for newer  
versions.
- The distributions that they are looking for newer versions of can  
be found on the server that they name via find-links.

In this case, they will get a warning that the distribution they are  
looking for couldn't be found on the index.  They didn't misspell  
anything, as setuptools should be able to deduce from the fact that  
their distribution was found on the link server.
I don't think that they should get a warning.  As far as I'm  
concerned, this means that distributions must always be stored on  
index servers and the find-links is just an attractive nuisance.


 I really find the distinction between indexes and find-links rather
 puzzling.

 --find-links is used to allow you to point easy_install to a  
 project's non-indexed home page or download page to find links, or  
 to provide other easy_install-processable links, without needing an  
 index.

But they are unusable without getting warnings whenever you want to  
check for updates.


 Personally, I'd like to find a way to merge these two concepts  
 into one
 by choosing a definition of an index that admits a directory full of
 distributions.

 Feel free to try to come up with one.  However, --find-links allows  
 *multiple* links to be specified, and it is also the basis for the  
 dependency_links argument to setup().  --find-links is also a  
 primitive upon which the index facility is built, since index pages  
 are treated more-or-less like --find-links URLs that are  
 automatically generated.

I don't need to, you already did

 At a minimum, merging the concepts would mean allowing multiple  
 index URLs, or else eliminating the idea of an index,

Yup. Sound good to me.


 and treating all --find-links URLs as though they 

Re: [Distutils] Mystery solved

2006-07-11 Thread Jim Fulton

On Jul 11, 2006, at 2:07 PM, Phillip J. Eby wrote:

 At 11:50 AM 7/11/2006 -0400, Jim Fulton wrote:
 OK that's an interesting point wrt possible misspellings. If you can
 find the package via the find links, but not via the index, that
 seems to me to be a pretty good indication that this is not a
 misspelling.  This is the case I'm worried about.  If the package
 can't be found anywhere, then I agree that a warning is warranted.

 The interesting question there is, should the fallback scan still  
 take place in the absence of the warning?  If it *does* take place,  
 then the reason for the scan (and delay) is unexplained.  If it  
 does *not* take place, then there is an undesirable change in  
 semantics.

 Currently, if you have a package called Bob's Incredible Package,  
 this will be treated by easy_install as being spelled Bob-s- 
 Incredible-Package, and it will require a top-level index scan to  
 find the right URL.  It is also possible to have --find-links pages  
 containing obsolete versions, while PyPI contains the latest  
 version, so removing the scan doesn't seem to be a reasonable option.

 So, I will simply change the message to an info message stating  
 that the index page couldn't be found (rather than a warning  
 suggesting misspelling), *if* easy_install has previously seen at  
 least one valid distribution file or link for the applicable  
 project name.

Great!


 The specific case, which I'll repeat from above, as clearly as I can,
 is this:

 - A user chooses not to store their software in an index.
 - The user places distributions on a web server somewhere.  This is
 just a directory, it is not a valid index.
 - The user points at their server using find-links
 - The user has an installation and they want to check for newer
 versions.
 - The distributions that they are looking for newer versions of can
 be found on the server that they name via find-links.

 In this case, they will get a warning that the distribution they are
 looking for couldn't be found on the index.

 Okay, this scenario is fixed by changing to an info message as  
 described above.

Yup. Cool.

   If you did that, however, it brings in the question of which of
 the --find-links URLs should be checked for a /projectname/
 subdirectory.  All of them?  Just the first one that finds a
 result?  None of them, if some other criterion is met?

 I would stop when a result is found.

 Even so, this means O(N x M) web hits, where N is the number of  
 packages and M is the number of --find-links (including dependency  
 links supplied by eggs installed so far).  I don't think it's  
 reasonable to hit so many non-existent URLs on non-index servers,  
 and is impolite to the servers' operators.  (For example, if they  
 receive a daily report of all 404 errors from their web servers, as  
 I do.  This is pretty common on Red Hat boxes using logwatch, for  
 example.)

 It's particularly unfair since using e.g. http:// 
 peak.telecommunity.com/snapshots/ as a --find-links while  
 installing, say TurboGears, would cause a whole host of index  
 hits to subdirectories of that URL, even though none of them can or  
 will be found.

 The fallout from this approach is far worse than any screen  
 scraping issues we've had.

Isn't this the approach that's followed now?  Aren't all of the find- 
links searched as well as the index?  I suppose you're referring to  
the search for /projectname, which potentially doubles the number of  
requests.

 What is the use case for spreading distributions over multiple
 servers?  Do people really want to do that? I can see providing
 multiple places to look, because different distributions might be on
 different servers, but I don't see why distributions for a single
 project should be spread over multiple servers.

 Platform-specific distributions may be provided by contributors to  
 a project, rather than by the project's author; see, for example,  
 Bob Ippolito's pages for distributing Mac OS X builds of popular  
 Python packages.  For this reason, you may have certain pages that  
 you always want included in your --find-links, to be checked in  
 addition to the normal indexes.

OK

Jim

--
Jim Fulton  mailto:[EMAIL PROTECTED]Python 
Powered!
CTO (540) 361-1714  
http://www.python.org
Zope Corporationhttp://www.zope.com http://www.zope.org



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Mystery solved

2006-07-10 Thread Jim Fulton

On Jul 10, 2006, at 5:50 PM, Phillip J. Eby wrote:
...

 2. The messages:

Couldn't find index page for 'jimtest' (maybe misspelled?)
Scanning index of all packages (this may take a while)

 should be info messages, not warning messages.  If they
 remain warnings, and you fix the first problem, it will be
 impossible to avoid the warnings without creating a PyPI
 project or creating an index server, and I don't think it was
 your intent to require either of these.  I don't think a warning
 should be issues for correct use of software.

 Here's the problem.  Reducing everything to info messages means  
 there's effectively no control over output detail.  I generally use  
 'warn()' for things that *may* reflect an error in input  
 parameters.  So, my take on the above is that although the  
 Scanning message could become an info(), the previous one shouldn't.

This means that one always has to use an index.  In which case, what  
is the point of find-links?

Jim

--
Jim Fulton  mailto:[EMAIL PROTECTED]Python 
Powered!
CTO (540) 361-1714  
http://www.python.org
Zope Corporationhttp://www.zope.com http://www.zope.org



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig