Re: trouble with -p
Am 2008-07-19 10:26:25, schrieb Micah Cowan: That strikes me as not quite right. If Wget sees http://www.ifixit.com/Guide/First-Look/iPhone3G, and it's not redirected to http://www.ifixit.com/Guide/First-Look/iPhone3G/, then Wget will use a file name. What's more, if it later sees it with the slash, it will fail to create a directory at all, since the file already exists with that pathname. I'm not sure what you mean by I want both. You can't possibly have a regular file named iPhone3G, and another file named iPhone3G/images/... it can't be both a file and a directory at once. If you specify the link with a trailing slash, then Wget will realize iPhone3G is a directory, and will store the file it finds there as iPhone3G/index.html. You're out of luck, though, if some links refer to it with, and some without, the trailing slash, with a server that doesn't redirect to the slash version (like Apache does). I think he mean the thing like the Web-Browsers do. If you download a HTML file with contents you will get: some_name.html some_name/ # the page requisites so if he try to downloag http://www.some-domain.tld/sub1/iPhone3G he want iPhone3G.html iPhone3G/ # the page requisites I would find this feature usefull too. Greetings Michelle -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ Michelle Konzack Apt. 917 ICQ # +49/177/935194750, rue de Soultz MSN +33/6/61925193 67100 Strasbourg/France IRC # signature.pgp Description: Digital signature
Re: trouble with -p
On Sun, 20 Jul 2008 23:08:56 +0200, Matthias Vill wrote: Brian Keck schrieb: If you do wget http://www.ifixit.com/Guide/First-Look/iPhone3G then you get an HTML file called iPhone3G. But if you do wget -p http://www.ifixit.com/Guide/First-Look/iPhone3G then you get a directory called iPhone3G. ... But of course I want both. Is there a way of getting wget -p to do something clever, like renaming the HTML file? ... maybe this helps: --html-extension That's what I was hoping for. At least it works for the above. (It also renames diggthis.js to diggthis.js.html, but I don't care about that). Thanks, Brian Keck
Re: trouble with -p
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Brian Keck wrote: (It also renames diggthis.js to diggthis.js.html, but I don't care about that). That's an indication that the server is misconfigured, and is serving diggthis.js as text/html, rather than text/javascript or text/x-javascript. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIiD4k7M8hyUobTrERAoJEAJ4q0N4lxfkDoQNtx62QMkGHXxmAlwCeIEdd NKprZGCw4lfMx/jybi/qriM= =Egpr -END PGP SIGNATURE-
Re: trouble with -p
On Sat, 19 Jul 2008 10:26:25 MST, Micah Cowan wrote: Brian Keck wrote: If you do wget http://www.ifixit.com/Guide/First-Look/iPhone3G then you get an HTML file called iPhone3G. But if you do wget -p http://www.ifixit.com/Guide/First-Look/iPhone3G then you get a directory called iPhone3G. ... If you specify the link with a trailing slash, then Wget will realize iPhone3G is a directory, and will store the file it finds there as iPhone3G/index.html. ... I should have thought of adding a trailing slash ... it works in this case. Thanks, Brian Keck
Re: trouble with -p
Hi Brian, maybe this helps: --html-extension If a file of type application/xhtml+xml or text/html is downloaded and the URL does not end with the regexp \.[Hh][Tt][Mm][Ll]?, this option will cause the suffix .html to be appended to the local filename. This is useful, for instance, when you're mirroring a remote site that uses .asp pages, but you want the mirrored pages to be viewable on your stock Apache server. Another good use for this is when you're downloading CGI-generated materials. A URL like http://site.com/article.cgi?25 will be saved as article.cgi?25.html. At least to me it seems that wget than should download everything. Not though that it will redownload all kinds of mangeled URLs (like this one) when wget is told to redownload the file or when wget reencounters it as link. Else you could append a ? to the URL which should be stripped on the server side anyway. Hope that helps Matthias Brian Keck schrieb: Hello, If you do wget http://www.ifixit.com/Guide/First-Look/iPhone3G then you get an HTML file called iPhone3G. But if you do wget -p http://www.ifixit.com/Guide/First-Look/iPhone3G then you get a directory called iPhone3G. This makes sense if you look at the links in the HTML file, like /Guide/First-Look/iPhone3G/images/3jYKHyIVrAHnG4Br-standard.jpg But of course I want both. Is there a way of getting wget -p to do something clever, like renaming the HTML file? I've looked through wget(1) /usr/share/doc/wget the comments in the 1.10.2 source without seeing anything relevant. Thanks, Brian Keck
Re: trouble with -p
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Brian Keck wrote: Hello, If you do wget http://www.ifixit.com/Guide/First-Look/iPhone3G then you get an HTML file called iPhone3G. But if you do wget -p http://www.ifixit.com/Guide/First-Look/iPhone3G then you get a directory called iPhone3G. This makes sense if you look at the links in the HTML file, like /Guide/First-Look/iPhone3G/images/3jYKHyIVrAHnG4Br-standard.jpg But of course I want both. Is there a way of getting wget -p to do something clever, like renaming the HTML file? I've looked through wget(1) /usr/share/doc/wget the comments in the 1.10.2 source without seeing anything relevant. That strikes me as not quite right. If Wget sees http://www.ifixit.com/Guide/First-Look/iPhone3G, and it's not redirected to http://www.ifixit.com/Guide/First-Look/iPhone3G/, then Wget will use a file name. What's more, if it later sees it with the slash, it will fail to create a directory at all, since the file already exists with that pathname. I'm not sure what you mean by I want both. You can't possibly have a regular file named iPhone3G, and another file named iPhone3G/images/... it can't be both a file and a directory at once. If you specify the link with a trailing slash, then Wget will realize iPhone3G is a directory, and will store the file it finds there as iPhone3G/index.html. You're out of luck, though, if some links refer to it with, and some without, the trailing slash, with a server that doesn't redirect to the slash version (like Apache does). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIgiPA7M8hyUobTrERAmq8AJ96TyBcrdI0YB06Z2tODRCMSI22AgCggESe jgXOMQ+uNMupbgq0vJZByv0= =jzGB -END PGP SIGNATURE-
Re: trouble with -p
Micah == Micah Cowan [EMAIL PROTECTED] writes: Micah I'm not sure what you mean by I want both. He means that, when the -p option is given, he wants to mangle either the created filename or the created directory name so that both do in fact get created on the filesystem and all related files get saved. Perhaps delaying the initial open(2) until after parsing the first document and then pretending that the initial URL had a trailing solidus might work? -JimC -- James Cloos [EMAIL PROTECTED] OpenPGP: 1024D/ED7DAEA6
Re: trouble with -p
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 James Cloos wrote: Micah == Micah Cowan [EMAIL PROTECTED] writes: Micah I'm not sure what you mean by I want both. He means that, when the -p option is given, he wants to mangle either the created filename or the created directory name so that both do in fact get created on the filesystem and all related files get saved. Perhaps delaying the initial open(2) until after parsing the first document and then pretending that the initial URL had a trailing solidus might work? Not possible with the current architecture. And that wouldn't solve the problem if it happens not to appear that way in the links immediately contained within. https://savannah.gnu.org/bugs/index.php?23756 covers my solution for handling this. The easy workaround for now, though, would be to supply the URL with the solidus in the first place, though as mentioned, I'm not sure that will work if it then later encounters a version without the solidus. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIgjPS7M8hyUobTrERArzeAJ90f55hIfPc4Rg/+q/mey7fNXQj9ACfV8ZL TNzLJKLVkB2J6EVJcMbwqW4= =jKGB -END PGP SIGNATURE-
Re: trouble with -p
Micah Cowan wrote: Brian Keck wrote: Sometimes -p doesn't work. For instance: wget -p http://en.wikipedia.org/wiki/Herbig-Haro_object In this case, it appears that you've bumped into the fact that wget, by default, will refuse to cross hostname boundaries to download things, unless you tell it otherwise. You want the -H option. Hmm, an interesting observation from that... am I missing something, or is there not currently an easy way to tell wget to span hosts in the same domain, but not span domains? For example, spanning to upload.wikipedia.org makes sense when grabbing from en.wikipedia.org, but spanning to casa.colorado.edu, www.daviddarling.info or sparky.rice.edu (to steal the external references from the mentioned article) probably isn't desired. Might be a useful wish for some point in the unspecified future. -- Matthew So long, and thanks for all the fish -- the dolphins
Re: trouble with -p
Matthew Woehlke wrote: Micah Cowan wrote: Brian Keck wrote: Sometimes -p doesn't work. For instance: wget -p http://en.wikipedia.org/wiki/Herbig-Haro_object In this case, it appears that you've bumped into the fact that wget, by default, will refuse to cross hostname boundaries to download things, unless you tell it otherwise. You want the -H option. Hmm, an interesting observation from that... am I missing something, or is there not currently an easy way to tell wget to span hosts in the same domain, but not span domains? For example, spanning to upload.wikipedia.org makes sense when grabbing from en.wikipedia.org, but spanning to casa.colorado.edu, www.daviddarling.info or sparky.rice.edu (to steal the external references from the mentioned article) probably isn't desired. Might be a useful wish for some point in the unspecified future. -D wikipedia.org will do. Note that we can't do this automatically (as: what's the domain?); even the assumption that a domain is whatever name is at the second level (such as right before com) is not always correct: for instance, many domains in the .name TLD were sold at the third level only. micah.cowan.name was sold separately from sara.cowan.name, and while those two both happen to belong to me, there are other foo.cowan.name's that belong to others, so traversing to those hosts wouldn't be appropriate. -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ signature.asc Description: OpenPGP digital signature
Re: trouble with -p
On Sun, 12 Aug 2007 19:44:36 MST, Micah Cowan wrote: Brian Keck wrote: Sometimes -p doesn't work. For instance: ... You want the -H option. Thanks, so I do, Brian Keck
Re: trouble with -p
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Brian Keck wrote: Hello, Sometimes -p doesn't work. For instance: wget -p http://en.wikipedia.org/wiki/Herbig-Haro_object Hi, The --debug flag will often provide useful information about why wget doesn't download something you expect it to. In this case, it appears that you've bumped into the fact that wget, by default, will refuse to cross hostname boundaries to download things, unless you tell it otherwise. You want the -H option. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGv8WT7M8hyUobTrERCCtHAJ0Vm0cpIIHr70p51xDCBv4M1ZHDbwCfdYvT RHW6aeYgpXEChRuOiEJkwhQ= =5X/Z -END PGP SIGNATURE-