Re: Question re server actions

2007-11-06 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Alan Thomas wrote:
   I admittedly do not know much about web server responses, and I
 have a question about why wget did not retrieve a document. . . .
  
I executed the following wget command:
  
 wget --recursive --level=20 --append-output=wget_log.txt
 --accept=pdf,doc,ppt,xls,zip,tar,gz,mov,avi,mpeg,mpg,wmv --no-parent
 --no-directories --directory-prefix=TEST_AnyLogic_Docs
 http://www.xjtek.com;
  
 However, it did not get the PDF document found by clicking on
 this link: http://www.xjtek.com/anylogic/license_agreement.  This URL
 automatically results in a download of a PDF file.
  
 Why?  Is there a wget option that will include this file? 

I believe it's being rejected because it doesn't end in a suffix that's
in your --accept list; it's a PDF file, but its URL doesn't end in .pdf.
It does use Content-Disposition to specify a filename, but the release
version of Wget doesn't acknowledge those.

If you use the current development version of Wget, and specify -e
content_disposition=on, it will download. If you're willing to try
that, you'll need to look at
http://wget.addictivecode.org/RepositoryAccess for information on how to
get the current development version of Wget (you should use the 1.11
repository, not mainline), and special building requirements.

- --
HTHm
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHMRq97M8hyUobTrERCJ6WAJwK6uv/HlrLmTA7zK5DLZCnswkofQCfbMvJ
6yAiHoWEsXLohuYmQTGlPDo=
=DWHZ
-END PGP SIGNATURE-


Bugs! [Re: Question re server actions]

2007-11-06 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Alan Thomas wrote:
 Thanks.  I unzipped those binaries, but I still have a problem. . . .
 
 I changed the wget command to:
 
 wget --recursive --level=20 --append-output=wget_log.txt -econtent_dispositi
 on=on  --accept=pdf,doc,ppt,xls,zip,tar,gz  --no-parent --no-directories --d
 irectory-prefix=TEST_AnyLogic_Docs http://www.xjtek.com;
 
 However, the log file shows:
 
 --2007-11-06 21:33:55--  http://www.xjtek.com/
 Resolving www.xjtek.com... 207.228.227.14
 Connecting to www.xjtek.com|207.228.227.14|:80... connected.
 HTTP request sent, awaiting response... 200 OK
 Length: unspecified [text/html]
 --2007-11-06 21:34:11--  http://www.xjtek.com/
 Connecting to www.xjtek.com|207.228.227.14|:80... connected.
 HTTP request sent, awaiting response... 200 OK
 Length: unspecified [text/html]
 Saving to: `TEST_AnyLogic_Docs/index.html'
 
  0K ..  128K=0.08s
 
 2007-11-06 21:34:12 (128 KB/s) - `TEST_AnyLogic_Docs/index.html' saved
 [11091]
 
 Removing TEST_AnyLogic_Docs/index.html since it should be rejected.
 
 FINISHED --2007-11-06 21:34:12--
 Downloaded: 1 files, 11K in 0.08s (128 KB/s)
 
 The version of wget is shown as 1.10+devel.

Congratulations! Looks like you've discovered a bug! :\

And just in time, too, as we're expecting to release 1.11 any day now.

When I try your version with --debug, it looks like it thinks all the
links are trying to escape upwards: that is, it thinks that they
disobey your --no-parents. You should be able to remove the --no-parents
from your command-line, and it will work, as in your case there _are_ no
parents to traverse to, and the --no-parents is superfluous.

I also discovered (what I consider to be) a bug, in that

wget -e content_disposition=on --accept=pdf -r
http://www.xjtek.com/anylogic/license_agreement/

downloads the file to ./License_AnyLogic_6.x.x.pdf, rather than to
www.xjtek.com/file/114/License_AnyLogic_6.x.x.pdf (the dirname for which
matches its URL after redirection).

 Also, I`m not sure why - is required vice -- in front of the new
 option.

It's not a long option; it's the short option -e, followed by an
argument, content_disposition=on. There is not currently a long-option
version for this. Support for Content-Disposition will be enabled by
default in Wget 1.12, so a long-option probably won't be added (unless
it's to disable the support).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHMVTe7M8hyUobTrERCFC0AJ9cSLdrnQOD7I770y5yBLPpNer6ggCfcGMj
G6q+mYUI+oooD9xkHURxTVw=
=ApQs
-END PGP SIGNATURE-