Re: [Distutils] Mystery solved
At 03:56 PM 7/11/2006 -0400, Jim Fulton wrote: On Jul 11, 2006, at 2:07 PM, Phillip J. Eby wrote: At 11:50 AM 7/11/2006 -0400, Jim Fulton wrote: I would stop when a result is found. Even so, this means O(N x M) web hits, where N is the number of packages and M is the number of --find-links (including dependency links supplied by eggs installed so far). I don't think it's reasonable to hit so many non-existent URLs on non-index servers, and is impolite to the servers' operators. (For example, if they receive a daily report of all 404 errors from their web servers, as I do. This is pretty common on Red Hat boxes using logwatch, for example.) It's particularly unfair since using e.g. http:// peak.telecommunity.com/snapshots/ as a --find-links while installing, say TurboGears, would cause a whole host of index hits to subdirectories of that URL, even though none of them can or will be found. The fallout from this approach is far worse than any screen scraping issues we've had. Isn't this the approach that's followed now? No; only the --find-links pages themselves are read, and one assumes that they actually exist. :) Aren't all of the find- links searched as well as the index? I suppose you're referring to the search for /projectname, which potentially doubles the number of requests. Doubling is only the beginning. If there are 5 dependencies, or 5 requirements on the command line, then it quintuples the number of requests, and they're all going to be retrieving non-existent URLs, except for whichever link was actually the package index. Of course, this is also ignoring the UI reason why the index URL and find-links URLs are specified separately, and that is that the common case is to use PyPI and maybe also a find-link or two. If they were specified by the same option, then any use of find-links would require you to retype the index URL. So, it's not a very convenient UI to merge the concepts, as well as being neither efficient for retrieval speed nor polite to site operators. ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Mystery solved
On Jul 11, 2006, at 11:25 AM, Phillip J. Eby wrote: At 03:54 AM 7/11/2006 -0400, Jim Fulton wrote: On Jul 10, 2006, at 6:22 PM, Phillip J. Eby wrote: Here's the problem. Reducing everything to info messages means there's effectively no control over output detail. I generally use 'warn()' for things that *may* reflect an error in input parameters. So, my take on the above is that although the Scanning message could become an info(), the previous one shouldn't. This means that one always has to use an index. In which case, what is the point of find-links? The point of --find-links is to provide links to unindexed packages. But if you use an unindexed package, you'll get a warning. Which means it *may* be an error. See above. If it was an error, it would be an error. The point of a warning is to inform you that something *may* be an error. IMO, you should not get a warning for correct use of software. Users should try to make warnings go away. Repeating it doesn't make it so. I'm not convinced that this particular warning (that you may have misspelled the package name because it's not in the index you're using) is in any way harmful. It is definitely so. That is definitely my opinion. :) OK that's an interesting point wrt possible misspellings. If you can find the package via the find links, but not via the index, that seems to me to be a pretty good indication that this is not a misspelling. This is the case I'm worried about. If the package can't be found anywhere, then I agree that a warning is warranted. If you give people warnings that they shouldn't make go away, That wasn't clear. If people are using the software correctly, but choosing to find distributions via find-links rather than an index, and they get warnings, then they will always get warnings and tend to ignore them. Huh? They can put the package in the index, use a different index, not use -U, specify an exact download URL (either directly or via -- find-links), etc. There are a huge number of ways *not* to encounter that particular warning. I have to use -U to get newer versions of distributions, even if I happen to store distributions in a directory that is not a valid index. In this case, I use find-links and -U together, and I'll get a warning unless I put distributions in an index. Either you can't have valid unindexed software, or setuptools shouldn't generate a warning if software isn't in the index. This is back to argument by assertion. Please explain to me why this warning is actually bad, rather than simply asserting that it's so. I assert and take as a premise that when users are using software correctly, including not misspelling anything, they should not get a warning. If you can't buy that, then we have an unreconcilable difference. The specific case, which I'll repeat from above, as clearly as I can, is this: - A user chooses not to store their software in an index. - The user places distributions on a web server somewhere. This is just a directory, it is not a valid index. - The user points at their server using find-links - The user has an installation and they want to check for newer versions. - The distributions that they are looking for newer versions of can be found on the server that they name via find-links. In this case, they will get a warning that the distribution they are looking for couldn't be found on the index. They didn't misspell anything, as setuptools should be able to deduce from the fact that their distribution was found on the link server. I don't think that they should get a warning. As far as I'm concerned, this means that distributions must always be stored on index servers and the find-links is just an attractive nuisance. I really find the distinction between indexes and find-links rather puzzling. --find-links is used to allow you to point easy_install to a project's non-indexed home page or download page to find links, or to provide other easy_install-processable links, without needing an index. But they are unusable without getting warnings whenever you want to check for updates. Personally, I'd like to find a way to merge these two concepts into one by choosing a definition of an index that admits a directory full of distributions. Feel free to try to come up with one. However, --find-links allows *multiple* links to be specified, and it is also the basis for the dependency_links argument to setup(). --find-links is also a primitive upon which the index facility is built, since index pages are treated more-or-less like --find-links URLs that are automatically generated. I don't need to, you already did At a minimum, merging the concepts would mean allowing multiple index URLs, or else eliminating the idea of an index, Yup. Sound good to me. and treating all --find-links URLs as though they
Re: [Distutils] Mystery solved
On Jul 11, 2006, at 2:07 PM, Phillip J. Eby wrote: At 11:50 AM 7/11/2006 -0400, Jim Fulton wrote: OK that's an interesting point wrt possible misspellings. If you can find the package via the find links, but not via the index, that seems to me to be a pretty good indication that this is not a misspelling. This is the case I'm worried about. If the package can't be found anywhere, then I agree that a warning is warranted. The interesting question there is, should the fallback scan still take place in the absence of the warning? If it *does* take place, then the reason for the scan (and delay) is unexplained. If it does *not* take place, then there is an undesirable change in semantics. Currently, if you have a package called Bob's Incredible Package, this will be treated by easy_install as being spelled Bob-s- Incredible-Package, and it will require a top-level index scan to find the right URL. It is also possible to have --find-links pages containing obsolete versions, while PyPI contains the latest version, so removing the scan doesn't seem to be a reasonable option. So, I will simply change the message to an info message stating that the index page couldn't be found (rather than a warning suggesting misspelling), *if* easy_install has previously seen at least one valid distribution file or link for the applicable project name. Great! The specific case, which I'll repeat from above, as clearly as I can, is this: - A user chooses not to store their software in an index. - The user places distributions on a web server somewhere. This is just a directory, it is not a valid index. - The user points at their server using find-links - The user has an installation and they want to check for newer versions. - The distributions that they are looking for newer versions of can be found on the server that they name via find-links. In this case, they will get a warning that the distribution they are looking for couldn't be found on the index. Okay, this scenario is fixed by changing to an info message as described above. Yup. Cool. If you did that, however, it brings in the question of which of the --find-links URLs should be checked for a /projectname/ subdirectory. All of them? Just the first one that finds a result? None of them, if some other criterion is met? I would stop when a result is found. Even so, this means O(N x M) web hits, where N is the number of packages and M is the number of --find-links (including dependency links supplied by eggs installed so far). I don't think it's reasonable to hit so many non-existent URLs on non-index servers, and is impolite to the servers' operators. (For example, if they receive a daily report of all 404 errors from their web servers, as I do. This is pretty common on Red Hat boxes using logwatch, for example.) It's particularly unfair since using e.g. http:// peak.telecommunity.com/snapshots/ as a --find-links while installing, say TurboGears, would cause a whole host of index hits to subdirectories of that URL, even though none of them can or will be found. The fallout from this approach is far worse than any screen scraping issues we've had. Isn't this the approach that's followed now? Aren't all of the find- links searched as well as the index? I suppose you're referring to the search for /projectname, which potentially doubles the number of requests. What is the use case for spreading distributions over multiple servers? Do people really want to do that? I can see providing multiple places to look, because different distributions might be on different servers, but I don't see why distributions for a single project should be spread over multiple servers. Platform-specific distributions may be provided by contributors to a project, rather than by the project's author; see, for example, Bob Ippolito's pages for distributing Mac OS X builds of popular Python packages. For this reason, you may have certain pages that you always want included in your --find-links, to be checked in addition to the normal indexes. OK Jim -- Jim Fulton mailto:[EMAIL PROTECTED]Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporationhttp://www.zope.com http://www.zope.org ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] Mystery solved
On Jul 10, 2006, at 5:50 PM, Phillip J. Eby wrote: ... 2. The messages: Couldn't find index page for 'jimtest' (maybe misspelled?) Scanning index of all packages (this may take a while) should be info messages, not warning messages. If they remain warnings, and you fix the first problem, it will be impossible to avoid the warnings without creating a PyPI project or creating an index server, and I don't think it was your intent to require either of these. I don't think a warning should be issues for correct use of software. Here's the problem. Reducing everything to info messages means there's effectively no control over output detail. I generally use 'warn()' for things that *may* reflect an error in input parameters. So, my take on the above is that although the Scanning message could become an info(), the previous one shouldn't. This means that one always has to use an index. In which case, what is the point of find-links? Jim -- Jim Fulton mailto:[EMAIL PROTECTED]Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporationhttp://www.zope.com http://www.zope.org ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig