Re: trouble with -p

2008-07-28 Thread Michelle Konzack
Am 2008-07-19 10:26:25, schrieb Micah Cowan:
 That strikes me as not quite right. If Wget sees
 http://www.ifixit.com/Guide/First-Look/iPhone3G, and it's not redirected
 to http://www.ifixit.com/Guide/First-Look/iPhone3G/, then Wget will use
 a file name. What's more, if it later sees it with the slash, it will
 fail to create a directory at all, since the file already exists with
 that pathname.
 
 I'm not sure what you mean by I want both. You can't possibly have a
 regular file named iPhone3G, and another file named iPhone3G/images/...
 it can't be both a file and a directory at once.
 
 If you specify the link with a trailing slash, then Wget will realize
 iPhone3G is a directory, and will store the file it finds there as
 iPhone3G/index.html. You're out of luck, though, if some links refer to
 it with, and some without, the trailing slash, with a server that
 doesn't redirect to the slash version (like Apache does).

I think he mean the thing like the Web-Browsers do.

If you download a HTML file with contents you will get:

some_name.html
some_name/  # the page requisites

so if he try to downloag

http://www.some-domain.tld/sub1/iPhone3G

he want

iPhone3G.html
iPhone3G/   # the page requisites

I would find this feature usefull too.

Greetings
Michelle

-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/ 
Michelle Konzack   Apt. 917  ICQ #
+49/177/935194750, rue de Soultz MSN 
+33/6/61925193 67100 Strasbourg/France   IRC #


signature.pgp
Description: Digital signature


Re: trouble with -p

2008-07-24 Thread Brian Keck

On Sun, 20 Jul 2008 23:08:56 +0200, Matthias Vill wrote:
Brian Keck schrieb:
 If you do
 wget http://www.ifixit.com/Guide/First-Look/iPhone3G
 then you get an HTML file called iPhone3G.
 But if you do
 wget -p http://www.ifixit.com/Guide/First-Look/iPhone3G
 then you get a directory called iPhone3G.  
 ...
 But of course I want both.  Is there a way of getting wget -p to do
 something clever, like renaming the HTML file?  
 ...
maybe this helps:
--html-extension

That's what I was hoping for.

At least it works for the above.

(It also renames diggthis.js to diggthis.js.html, but I don't care about
that).

Thanks,
Brian Keck


Re: trouble with -p

2008-07-24 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Brian Keck wrote:
 (It also renames diggthis.js to diggthis.js.html, but I don't care about
 that).

That's an indication that the server is misconfigured, and is serving
diggthis.js as text/html, rather than text/javascript or text/x-javascript.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIiD4k7M8hyUobTrERAoJEAJ4q0N4lxfkDoQNtx62QMkGHXxmAlwCeIEdd
NKprZGCw4lfMx/jybi/qriM=
=Egpr
-END PGP SIGNATURE-


Re: trouble with -p

2008-07-23 Thread Brian Keck

On Sat, 19 Jul 2008 10:26:25 MST, Micah Cowan wrote:
Brian Keck wrote:
If you do
wget http://www.ifixit.com/Guide/First-Look/iPhone3G
then you get an HTML file called iPhone3G.
But if you do
wget -p http://www.ifixit.com/Guide/First-Look/iPhone3G
then you get a directory called iPhone3G.  
...
If you specify the link with a trailing slash, then Wget will realize
iPhone3G is a directory, and will store the file it finds there as
iPhone3G/index.html. 
...

I should have thought of adding a trailing slash ... it works in this
case.

Thanks,
Brian Keck



Re: trouble with -p

2008-07-20 Thread Matthias Vill

Hi Brian,

maybe this helps:
--html-extension
	If a file of type application/xhtml+xml or text/html is downloaded and 
the URL does not end with the regexp \.[Hh][Tt][Mm][Ll]?, this option 
will cause the suffix .html to be appended to the local filename.  This 
is useful, for instance, when you're mirroring a remote site that uses 
.asp pages, but you want the mirrored pages to be viewable on your stock 
Apache server.  Another good use for this is when you're downloading 
CGI-generated materials.  A URL like http://site.com/article.cgi?25 will 
be saved as article.cgi?25.html.


At least to me it seems that wget than should download everything. Not 
though that it will redownload all kinds of mangeled URLs (like this 
one) when wget is told to redownload the file or when wget reencounters 
it as link.


Else you could append a ? to the URL which should be stripped on the 
server side anyway.


Hope that helps

Matthias

Brian Keck schrieb:

Hello,

If you do

wget http://www.ifixit.com/Guide/First-Look/iPhone3G

then you get an HTML file called iPhone3G.

But if you do

wget -p http://www.ifixit.com/Guide/First-Look/iPhone3G

then you get a directory called iPhone3G.  


This makes sense if you look at the links in the HTML file, like

/Guide/First-Look/iPhone3G/images/3jYKHyIVrAHnG4Br-standard.jpg

But of course I want both.  Is there a way of getting wget -p to do
something clever, like renaming the HTML file?  I've looked through
wget(1)  /usr/share/doc/wget  the comments in the 1.10.2 source
without seeing anything relevant.

Thanks,
Brian Keck



Re: trouble with -p

2008-07-19 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Brian Keck wrote:
 Hello,
 
 If you do
 
 wget http://www.ifixit.com/Guide/First-Look/iPhone3G
 
 then you get an HTML file called iPhone3G.
 
 But if you do
 
 wget -p http://www.ifixit.com/Guide/First-Look/iPhone3G
 
 then you get a directory called iPhone3G.  
 
 This makes sense if you look at the links in the HTML file, like
 
 /Guide/First-Look/iPhone3G/images/3jYKHyIVrAHnG4Br-standard.jpg
 
 But of course I want both.  Is there a way of getting wget -p to do
 something clever, like renaming the HTML file?  I've looked through
 wget(1)  /usr/share/doc/wget  the comments in the 1.10.2 source
 without seeing anything relevant.

That strikes me as not quite right. If Wget sees
http://www.ifixit.com/Guide/First-Look/iPhone3G, and it's not redirected
to http://www.ifixit.com/Guide/First-Look/iPhone3G/, then Wget will use
a file name. What's more, if it later sees it with the slash, it will
fail to create a directory at all, since the file already exists with
that pathname.

I'm not sure what you mean by I want both. You can't possibly have a
regular file named iPhone3G, and another file named iPhone3G/images/...
it can't be both a file and a directory at once.

If you specify the link with a trailing slash, then Wget will realize
iPhone3G is a directory, and will store the file it finds there as
iPhone3G/index.html. You're out of luck, though, if some links refer to
it with, and some without, the trailing slash, with a server that
doesn't redirect to the slash version (like Apache does).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIgiPA7M8hyUobTrERAmq8AJ96TyBcrdI0YB06Z2tODRCMSI22AgCggESe
jgXOMQ+uNMupbgq0vJZByv0=
=jzGB
-END PGP SIGNATURE-


Re: trouble with -p

2008-07-19 Thread James Cloos
 Micah == Micah Cowan [EMAIL PROTECTED] writes:

Micah I'm not sure what you mean by I want both. 

He means that, when the -p option is given, he wants to mangle either
the created filename or the created directory name so that both do in
fact get created on the filesystem and all related files get saved.

Perhaps delaying the initial open(2) until after parsing the first
document and then pretending that the initial URL had a trailing
solidus might work?

-JimC
-- 
James Cloos [EMAIL PROTECTED] OpenPGP: 1024D/ED7DAEA6


Re: trouble with -p

2008-07-19 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

James Cloos wrote:
 Micah == Micah Cowan [EMAIL PROTECTED] writes:
 
 Micah I'm not sure what you mean by I want both. 
 
 He means that, when the -p option is given, he wants to mangle either
 the created filename or the created directory name so that both do in
 fact get created on the filesystem and all related files get saved.
 
 Perhaps delaying the initial open(2) until after parsing the first
 document and then pretending that the initial URL had a trailing
 solidus might work?

Not possible with the current architecture. And that wouldn't solve the
problem if it happens not to appear that way in the links immediately
contained within.

https://savannah.gnu.org/bugs/index.php?23756 covers my solution for
handling this.

The easy workaround for now, though, would be to supply the URL with the
solidus in the first place, though as mentioned, I'm not sure that will
work if it then later encounters a version without the solidus.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIgjPS7M8hyUobTrERArzeAJ90f55hIfPc4Rg/+q/mey7fNXQj9ACfV8ZL
TNzLJKLVkB2J6EVJcMbwqW4=
=jKGB
-END PGP SIGNATURE-


Re: trouble with -p

2007-08-16 Thread Matthew Woehlke

Micah Cowan wrote:

Brian Keck wrote:

Sometimes -p doesn't work.  For instance:

wget -p http://en.wikipedia.org/wiki/Herbig-Haro_object


In this case, it appears that you've bumped into the fact that wget, by
default, will refuse to cross hostname boundaries to download things,
unless you tell it otherwise. You want the -H option.


Hmm, an interesting observation from that... am I missing something, or 
is there not currently an easy way to tell wget to span hosts in the 
same domain, but not span domains? For example, spanning to 
upload.wikipedia.org makes sense when grabbing from en.wikipedia.org, 
but spanning to casa.colorado.edu, www.daviddarling.info or 
sparky.rice.edu (to steal the external references from the mentioned 
article) probably isn't desired.


Might be a useful wish for some point in the unspecified future.

--
Matthew
So long, and thanks for all the fish -- the dolphins



Re: trouble with -p

2007-08-16 Thread Micah Cowan
Matthew Woehlke wrote:
 Micah Cowan wrote:
 Brian Keck wrote:
 Sometimes -p doesn't work.  For instance:

 wget -p http://en.wikipedia.org/wiki/Herbig-Haro_object

 In this case, it appears that you've bumped into the fact that wget, by
 default, will refuse to cross hostname boundaries to download things,
 unless you tell it otherwise. You want the -H option.
 
 Hmm, an interesting observation from that... am I missing something, or
 is there not currently an easy way to tell wget to span hosts in the
 same domain, but not span domains? For example, spanning to
 upload.wikipedia.org makes sense when grabbing from en.wikipedia.org,
 but spanning to casa.colorado.edu, www.daviddarling.info or
 sparky.rice.edu (to steal the external references from the mentioned
 article) probably isn't desired.
 
 Might be a useful wish for some point in the unspecified future.

-D wikipedia.org will do.

Note that we can't do this automatically (as: what's the domain?); even
the assumption that a domain is whatever name is at the second level
(such as right before com) is not always correct: for instance, many
domains in the .name TLD were sold at the third level only.
micah.cowan.name was sold separately from sara.cowan.name, and while
those two both happen to belong to me, there are other foo.cowan.name's
that belong to others, so traversing to those hosts wouldn't be appropriate.

-- 
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/



signature.asc
Description: OpenPGP digital signature


Re: trouble with -p

2007-08-13 Thread Brian Keck
On Sun, 12 Aug 2007 19:44:36 MST, Micah Cowan wrote:
Brian Keck wrote:
 Sometimes -p doesn't work.  For instance:
...
You want the -H option.

Thanks, so I do,
Brian Keck


Re: trouble with -p

2007-08-12 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Brian Keck wrote:
 Hello,
 
 Sometimes -p doesn't work.  For instance:
 
   wget -p http://en.wikipedia.org/wiki/Herbig-Haro_object

Hi,

The --debug flag will often provide useful information about why wget
doesn't download something you expect it to.

In this case, it appears that you've bumped into the fact that wget, by
default, will refuse to cross hostname boundaries to download things,
unless you tell it otherwise. You want the -H option.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGv8WT7M8hyUobTrERCCtHAJ0Vm0cpIIHr70p51xDCBv4M1ZHDbwCfdYvT
RHW6aeYgpXEChRuOiEJkwhQ=
=5X/Z
-END PGP SIGNATURE-