Re: Updates on Wget Future Directions

Julien B. Mon, 31 Mar 2008 22:42:02 -0700

I always like the first of April jokes
People tend to be very inventive :)

"Unccl svefg bs Ncevy, sbyxf" on Vim visual mode with Vg? it turns
more readable.


The Gday is nice too.
http://www.google.com.au/intl/en/gday/index.html
Have fun.

On Tue, Apr 1, 2008 at 4:10 AM, Micah Cowan <[EMAIL PROTECTED]> wrote:
> Well, I have some announcements regarding decisions that have been made
>  regarding future directions in Wget.
>
>  First off, I've reversed my previous decision not to include "download
>  accelerator" features in the multi-streaming version of Wget. It's
>  becoming clear to me that the benefits far outweigh any disadvantages.
>  As tool developers, it's our job to supply powerful jobs; it's the
>  users' job to use them with the appropriate discretion. And while it may
>  be troublesome to the administrators of smaller servers that may become
>  overburdened when less-polite users abuse Wget, yet the careful
>  application by users who know that the servers can handle the requests
>  have the potential to produce such striking effects on download speeds,
>  that it seems to me that it's irresponsible to deny such strong
>  improvements to those who can use Wget responsibly, just for the sake of
>  those who might abuse it. Considering that the time required to download
>  a 2 GB file from the web can be reduced ten-fold, simply by splitting
>  the work into ten separate, simultaneous download streams for 200 MB
>  each, it's really elitist of us to tell users, "no, you can't do that,
>  because you might not know what you're doing."
>
>  Besides which, it's quite clear from the number of requests we've
>  received for this functionality, that the addition of this feature will
>  boost Wget's popularity significantly. We've really no excuse to leave
>  it out!
>
>  .
>
>  Following the same policy of "providing the tool, without dictating the
>  use", it has come to my attention that a not-insignificant portion of
>  our user base use Wget to perform "screen-scraping" on other sites.
>  There are a variety of motivations for such practices, which include
>  analysis of periodically-changing data, site-style imitation, and of
>  course full look-alike site imitation. The latter is particularly
>  popular with websites corresponding to financial institutions.
>
>  That last group often consists of users with significant funding at
>  their disposal, which they could easily put towards financing further
>  Wget development. To this end, there are a few additional features I've
>  been considering, aimed at appealing to this portion of Wget's user
>  base.
>
>  The one I'll mention today is the --ichthus option. Invoking Wget with:
>
>   wget --ichthus URL-A URL-B
>
>  Will download URL-A and any prerequisites (images, CSS, etc), perform
>  some conversions, and then automatically upload the results to URL-B
>  (via FTP or WebDAV, configuration options for which will be discussed at
>  a later date).
>
>  The specific conversions to be applied after download include converting
>  relative URLs to absolute URLs, and the conversion of all
>  form-submission URLs to point to locations at the host site for URL-B,
>  obfuscating it in such a way as to appear to still be pointing to a
>  location on URL-A's host.
>
>  For example, if the page at
>  https://www.infidelitybanking.com/loginPage.php contains a form whose
>  action attribute has the value "loginProcess.php?submit=foo", then
>  running:
>
>   wget --ichthus https://www.infidelitybanking.com/loginPage.php \
>     https://256.133.312.10/
>
>  would download loginPage.php from site A, and upload it to site B,
>  except that any relative links would be converted to absolute links
>  (with site A as a baseref); and the HTML form's action would be
>  converted to something like:
>
>  https://www.infidelitybanking.com:[EMAIL PROTECTED]/cgi-bin/loginPage.cgi
>
>  .
>
>  There's been a lot of discussion lately about how the architecture of
>  Wget's accept/reject lists could be improved. One thing that hasn't had
>  much treatment, though (well, any, really) is how potentially
>  _demeaning_ the existing terminology can be.
>
>  Representing the decision whether or not to download a given URL as
>  either "accepted" or "rejected" is a rather harsh, perhaps even cruel,
>  way of dividing the world. It can tend to convey the mistaken impression
>  that some URLs are intrinsically "bad" while others are intrinsically
>  "good". This can have obvious consequences for self-esteem, and yet it's
>  clear that a URL that may be "rejected" for a particular session's needs
>  today, may well be "accepted" in some future session.
>
>  Therefore, I'd like to propose that we replace the current terminology
>  with something more politically sensitive. Rather than --accept
>  --reject, perhaps --you-fit-my-needs-today and
>  --not-a-good-fit-for-me-at-this-time? Those names don't feel quite right
>  (in particular, they're a bit lengthy); but I think you get the general
>  idea; perhaps someone can suggest something better?
>
>  .
>
>  Finally, thanks to Julien Buty's helpful recommendation that Wget take
>  part in this year's Google Summer of Code, we've received a number of
>  excellent proposals from students eager to take part. A few of these
>  include some great and novel ideas.
>
>  The most promising of these, and something I don't believe previous Wget
>  maintainers had given much thought to, is the proposal that Wget support
>  HTCPCP (which is based on good ol' HTTP) as one of its primary supported
>  transport mechanisms. It's amazing to me that we still currently lack
>  support for this protocol, which is such an important part of the World
>  Wide Web. In addition, I'm fairly certain that this is one of the few
>  transport layers that the Curl guys still have yet to include, so if we
>  beat them to the punch, we may have one over on them. :)
>
>  More information on this most-venerated of protocols may be had at
>  http://www.ietf.org/rfc/rfc2324.txt.
>
>  --
>  Unccl svefg bs Ncevy, sbyxf.
>

Re: Updates on Wget Future Directions

Reply via email to