gzip question

2007-12-19 Thread Christopher Eastwood
 

Does wget automatically decompress gzip compressed files?  Is there a
way to get wget NOT to decompress gzip cpmpressed files, but to download
them as the gzipped file?

 

Thanks,

Christopher



gzip question

2007-12-19 Thread Christopher Eastwood
Does wget automatically decompress gzip compressed files?  Is there a
way to get wget NOT to decompress gzip cpmpressed files, but to download
them as the gzipped file?

 

Thanks,

Christopher



Re: gzip question

2007-12-19 Thread Steven M. Schweda
From: Christopher Eastwood

 Does wget automatically decompress gzip compressed files?

   I don't think so.  Have you any evidence that it does this?  (Wget
version?  OS?  Example with transcript?)

   Is there a
 way to get wget NOT to decompress gzip cpmpressed files, but to download
 them as the gzipped file?

   Just specify the gzip-compressed file, so far as I know.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


gzip question

2007-12-19 Thread Christopher Eastwood
 

Does wget automatically decompress gzip compressed files?  Is there a
way to get wget NOT to decompress gzip cpmpressed files, but to download
them as the gzipped file?

 

Thanks,

Christopher



RE: gzip question

2007-12-19 Thread Christopher Eastwood

wget --header='Accept-Encoding: gzip, deflate' http://{gzippedcontent}


-Original Message-
From: Steven M. Schweda [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 19, 2007 2:57 PM
To: WGET@sunsite.dk
Cc: Christopher Eastwood
Subject: Re: gzip question

From: Christopher Eastwood

 Does wget automatically decompress gzip compressed files?

   I don't think so.  Have you any evidence that it does this?  (Wget
version?  OS?  Example with transcript?)

   Is there a
 way to get wget NOT to decompress gzip cpmpressed files, but to
download
 them as the gzipped file?

   Just specify the gzip-compressed file, so far as I know.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: gzip question

2007-12-19 Thread Steven M. Schweda
From: Christopher Eastwood

 wget --header=3D'Accept-Encoding: gzip, deflate' http://{gzippedcontent}

   Doctor, it hurts when I do this.
   Don't do that.

   What does it do without --header='Accept-Encoding: gzip, deflate'?

 [...] (Wget version?  OS?  Example with transcript?)

   Still waiting for those data.  Also, when I say Example, I normally
mean An actual example, that is, one which can be tested and verified.

   Adding -d to the wget command can also be informative.

   SMS.


Re: Question about spidering

2007-12-12 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Srinivasan Palaniappan wrote:
   I am using WGET version 1.10.2, and trying to crawl through a secured
 site (that we are developing for our customer) I noticed two things.
 WGET is not downloading all the binaries in the website. It downloads
 about 30% of it then skips the rest of the documents. But I don't see
 any log files that shows me some kind of error messaging saying unable
 to download during spidering, I am not sure I am doing the right thing
 can you let me know from the following .wgetrc file and the command line
 I run.
 
  
 
 .wgetrc
 
 
 exclude_directories =
 /ascp/commerce/catalog,/ascp/commerce/checkout,/ascp/commerce/user,/ascp/commerce/common,/ascp/commerce/javascript,/ascp/commerce/css
 
 
 include_directories = /ascp/commerce,/ascp/commerce/scp/downloads
 
 dir_prefix=\spiderfiles\ascpProd\wget
 
 domains=www.mysite.com
 
 no_parent=on
 
 secure-protocol=SSLv3

  ^^^ This should use an underscore, not a dash.

 wget -r l5 --save-headers --no-check-certificate   https://www.mystie.com
  ^^
- -r doesn't take an argument. Perhaps you wanted a -l before the 15?

 In addition, I noticed when the metadata information written to the
 downloaded file has only HTTP has scheme, which is somewhat weird do you
 know anything about it?

I'm not understanding you here. Do you mean that it said, https://...:
Unsupported scheme? In that case, I don't see how it could have
downloaded 30% of anything, as it means it wasn't compiled with support
for SSL and HTTPS.

The best way to try to see what might be going on, is to invoke wget
with the --debug flag, and probably use the -o logfile option. That
could help us to see what might be going on.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHYCXx7M8hyUobTrERAs4AAJsFXHLNnV/9hmtNNd03tR8jlCswkwCeP7eA
wKWaIMY2XZk5vwP4RK0eVo8=
=rPh2
-END PGP SIGNATURE-


Re: Question about spidering

2007-12-12 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Micah Cowan wrote:
 Srinivasan Palaniappan wrote:
 wget -r l5 --save-headers --no-check-certificate   https://www.mystie.com
   ^^
 -r doesn't take an argument. Perhaps you wanted a -l before the 15?

Or a - before the l5. Curse the visual ambiguity between l and 1!

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHYDHp7M8hyUobTrERAvW4AKCEWlK9fTgFiZEDkG0E1hYmkLrYygCgkGD4
yPuP2RulYyKY/hSIxD+ZJTI=
=83w+
-END PGP SIGNATURE-


Question about spidering

2007-12-11 Thread Srinivasan Palaniappan
Hi,



  I am using WGET version 1.10.2, and trying to crawl through a secured site
(that we are developing for our customer) I noticed two things. WGET is not
downloading all the binaries in the website. It downloads about 30% of it
then skips the rest of the documents. But I don't see any log files that
shows me some kind of error messaging saying unable to download during
spidering, I am not sure I am doing the right thing can you let me know from
the following .wgetrc file and the command line I run.



.wgetrc



exclude_directories =
/ascp/commerce/catalog,/ascp/commerce/checkout,/ascp/commerce/user,/ascp/commerce/common,/ascp/commerce/javascript,/ascp/commerce/css

include_directories = /ascp/commerce,/ascp/commerce/scp/downloads

dir_prefix=\spiderfiles\ascpProd\wget

domains=www.mysite.com

no_parent=on

secure-protocol=SSLv3





command line

---

wget -r l5 --save-headers --no-check-certificate   https://www.mystie.com



In addition, I noticed when the metadata information written to the
downloaded file has only HTTP has scheme, which is somewhat weird do you
know anything about it?



Regards,


Re: Content disposition question

2007-12-10 Thread Hrvoje Niksic
Micah Cowan [EMAIL PROTECTED] writes:

 Actually, the reason it is not enabled by default is that (1) it is
 broken in some respects that need addressing, and (2) as it is currently
 implemented, it involves a significant amount of extra traffic,
 regardless of whether the remote end actually ends up using
 Content-Disposition somewhere.

I'm curious, why is this the case?  I thought the code was refactored
to determine the file name after the headers arrive.  It certainly
looks that way by the output it prints:

{mulj}[~]$ wget www.cnn.com
[...]
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `index.html'   # not saving to only after the HTTP response

Where does the extra traffic come from?

 Note that it is not available at all in any release version of Wget;
 only in the current development versions. We will be releasing Wget 1.11
 very shortly, which will include the --content-disposition
 functionality; however, this functionality is EXPERIMENTAL only. It
 doesn't quite behave properly, and needs some severe adjustments before
 it is appropriate to leave as default.

If it is not ready for general use, we should consider removing it
from NEWS.  If not, it should be properly documented in the manual.  I
am aware that the NEWS entry claims that the feature is experimental,
but why even mention it if it's not ready for general consumption?
Announcing experimental features in NEWS is a good way to make testers
aware of them during the alpha/beta release cycle, but it should be
avoid in production releases of mature software.

 As to breaking old scripts, I'm not really concerned about that (and
 people who read the NEWS file, as anyone relying on previous
 behaviors for Wget should do, would just need to set
 --no-content-disposition, when the time comes that we enable it by
 default).

Agreed.


Re: Content disposition question

2007-12-10 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hrvoje Niksic wrote:
 Micah Cowan [EMAIL PROTECTED] writes:
 
 Actually, the reason it is not enabled by default is that (1) it is
 broken in some respects that need addressing, and (2) as it is currently
 implemented, it involves a significant amount of extra traffic,
 regardless of whether the remote end actually ends up using
 Content-Disposition somewhere.
 
 I'm curious, why is this the case?  I thought the code was refactored
 to determine the file name after the headers arrive.  It certainly
 looks that way by the output it prints:
 
 {mulj}[~]$ wget www.cnn.com
 [...]
 HTTP request sent, awaiting response... 200 OK
 Length: unspecified [text/html]
 Saving to: `index.html'   # not saving to only after the HTTP response
 
 Where does the extra traffic come from?

Your example above doesn't set --content-disposition; if you do, there
is an extra HEAD request sent.

As to why this is the case, I believe it was so that we could properly
handle accepts/rejects, whereas we will otherwise usually assume that we
can match accept/reject against the URL itself (we currently do this
improperly for the -nd -r case, still matching using the generated
file name's suffix).

Beyond that, I'm not sure as to why, and it's my intention that it not
be done in 1.12. Removing it for 1.11 is too much trouble, as the
sending-HEAD and sending-GET is not nearly decoupled enough to do it
without risk (and indeed, we were seeing trouble where everytime we
fixed an issue with the send-head-first issue, something else would
break). I want to do some reworking of gethttp and http_loop before I
will feel comfortable in changing how they work.

 If it is not ready for general use, we should consider removing it
 from NEWS.

I had thought of that. The thing that has kept me from it so far is that
 it is a feature that is desired by many people, and for most of them,
it will work (the issues are pretty minor, and mainly corner-case,
except perhaps for the fact that they are apparently always downloaded
to the top directory, and not the one in which the URL was found).

And, if we leave it out of NEWS and documentation, then, when we answer
people who ask How can I get Wget to respect Content-Disposition
headers?, the natural follow-up will be, Why isn't this mentioned
anywhere in the documentation?. :)

 If not, it should be properly documented in the manual.

Yes... I should be more specific about its shortcomings.

 I am aware that the NEWS entry claims that the feature is experimental,
 but why even mention it if it's not ready for general consumption?
 Announcing experimental features in NEWS is a good way to make testers
 aware of them during the alpha/beta release cycle, but it should be
 avoid in production releases of mature software.

It's pretty much good enough; it's not where I want it, but it _is_
usable. The extra traffic is really the main reason I don't want it
on-by-default.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHXUFY7M8hyUobTrERAkGPAJwLTDHPqdfP3kIN7Zfxmh8RmjbdMACaA6yG
bkKcZfTt0lGpbU79y+AYXF8=
=ZHEv
-END PGP SIGNATURE-


Re: Content disposition question

2007-12-10 Thread Hrvoje Niksic
Micah Cowan [EMAIL PROTECTED] writes:

 I thought the code was refactored to determine the file name after
 the headers arrive.  It certainly looks that way by the output it
 prints:
 
 {mulj}[~]$ wget www.cnn.com
 [...]
 HTTP request sent, awaiting response... 200 OK
 Length: unspecified [text/html]
 Saving to: `index.html'   # not saving to only after the HTTP response
 
 Where does the extra traffic come from?

 Your example above doesn't set --content-disposition;

I'm aware of that, but the above example was supposed to point out the
refactoring that has already taken place, regardless of whether
--content-disposition is specified.  As shown above, Wget always waits
for the headers before determining the file name.  If that is the
case, it would appear that no additional traffic is needed to get
Content-Disposition, Wget simply needs to use the information already
received.

 As to why this is the case, I believe it was so that we could
 properly handle accepts/rejects,

Issuing another request seems to be the wrong way to go about it, but
I haven't thought about it hard enough, so I could be missing a lot of
subtleties.

 I am aware that the NEWS entry claims that the feature is experimental,
 but why even mention it if it's not ready for general consumption?
 Announcing experimental features in NEWS is a good way to make testers
 aware of them during the alpha/beta release cycle, but it should be
 avoid in production releases of mature software.

 It's pretty much good enough; it's not where I want it, but it
 _is_ usable. The extra traffic is really the main reason I don't
 want it on-by-default.

It should IMHO be documented, then.  Even if it's documented as
experimental.


Content disposition question

2007-12-03 Thread Vladimir Niksic
Hi!

I have noticed that wget doesn't automatically use the option 
'--content-disposition'. So what happens is when you download something
from a site that uses content disposition, the resulting file on the
filesystem is not what it should be.

For example, when downloading an Ubuntu torrent from mininova I get:

{uragan}[~/tmp]$ wget http://www.mininova.org/get/946879
--2007-12-03 15:58:46--  http://www.mininova.org/get/946879
Resolving www.mininova.org... 87.233.147.140
Connecting to www.mininova.org|87.233.147.140|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28064 (27K) [application/x-bittorrent]
Saving to: `946879'

100%[]
28,064  87.0K/s   in 0.3s

2007-12-03 15:58:47 (87.0 KB/s) - `946879' saved [28064/28064]

When use the option --content-disposition:

{uragan}[~/tmp]$ wget --content-disposition
http://www.mininova.org/get/946879
--2007-12-03 15:59:18--  http://www.mininova.org/get/946879
Resolving www.mininova.org... 87.233.147.140
Connecting to www.mininova.org|87.233.147.140|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [application/x-bittorrent]
--2007-12-03 15:59:18--  http://www.mininova.org/get/946879
Connecting to www.mininova.org|87.233.147.140|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28064 (27K) [application/x-bittorrent]
Saving to: `-{mininova.org}- ubuntu-7.10-desktop-i386.iso.torrent'

100%[]
28,064  47.8K/s   in 0.6s

2007-12-03 15:59:19 (47.8 KB/s) - `-{mininova.org}-
ubuntu-7.10-desktop-i386.iso.torrent' saved [28064/28064]


I realize that I could put this option in .wgetrc, but I think that it
would be better if this was the default because the majority of users is
unaware of this option, and cannot hope to find it unless acquainted
with the inner mechanics of HTTP. Also, it's nearly impossible to find.
I've been googling it and finally managed to dig it up from the
documentation.



Re: Content disposition question

2007-12-03 Thread Matthias Vill
Hi,

we know this. This was just recently discussed on the mailinglist and I
agree with you.
But there are two arguments why this is not default:
a) It's a quite new feature for wget and therefore would brake
compatibility with prior versions and any old script would need to be
rewritten.
b) It's impossible to pre-guess the filename and thus it is not so well
suited for script-usage.

I would like to have this feature enabled by some --interactive switch
(which could include more options and might be easier to find) or as you
suggested as default with an disable switch.

Greetings

Matthias

Vladimir Niksic wrote:
 I have noticed that wget doesn't automatically use the option 
 '--content-disposition'. So what happens is when you download something
 from a site that uses content disposition, the resulting file on the
 filesystem is not what it should be.

 I realize that I could put this option in .wgetrc, but I think that it
 would be better if this was the default because the majority of users is
 unaware of this option, and cannot hope to find it unless acquainted
 with the inner mechanics of HTTP. Also, it's nearly impossible to find.
 I've been googling it and finally managed to dig it up from the
 documentation.
 


Re: Content disposition question

2007-12-03 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Matthias Vill wrote:
 Hi,
 
 we know this. This was just recently discussed on the mailinglist and I
 agree with you.
 But there are two arguments why this is not default:
 a) It's a quite new feature for wget and therefore would brake
 compatibility with prior versions and any old script would need to be
 rewritten.
 b) It's impossible to pre-guess the filename and thus it is not so well
 suited for script-usage.
 
 I would like to have this feature enabled by some --interactive switch
 (which could include more options and might be easier to find) or as you
 suggested as default with an disable switch.

Actually, the reason it is not enabled by default is that (1) it is
broken in some respects that need addressing, and (2) as it is currently
implemented, it involves a significant amount of extra traffic,
regardless of whether the remote end actually ends up using
Content-Disposition somewhere.

Note that it is not available at all in any release version of Wget;
only in the current development versions. We will be releasing Wget 1.11
very shortly, which will include the --content-disposition
functionality; however, this functionality is EXPERIMENTAL only. It
doesn't quite behave properly, and needs some severe adjustments before
it is appropriate to leave as default.

As to breaking old scripts, I'm not really concerned about that (and
people who read the NEWS file, as anyone relying on previous behaviors
for Wget should do, would just need to set --no-content-disposition,
when the time comes that we enable it by default).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHVDmo7M8hyUobTrERAtYoAKCR6bYexmpqj5Wud6p9evttgDCMgwCfdoQY
oXbPU6EwfQhQhfN0Pi9wC+E=
=t6et
-END PGP SIGNATURE-


Re: Question re server actions

2007-11-06 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Alan Thomas wrote:
   I admittedly do not know much about web server responses, and I
 have a question about why wget did not retrieve a document. . . .
  
I executed the following wget command:
  
 wget --recursive --level=20 --append-output=wget_log.txt
 --accept=pdf,doc,ppt,xls,zip,tar,gz,mov,avi,mpeg,mpg,wmv --no-parent
 --no-directories --directory-prefix=TEST_AnyLogic_Docs
 http://www.xjtek.com;
  
 However, it did not get the PDF document found by clicking on
 this link: http://www.xjtek.com/anylogic/license_agreement.  This URL
 automatically results in a download of a PDF file.
  
 Why?  Is there a wget option that will include this file? 

I believe it's being rejected because it doesn't end in a suffix that's
in your --accept list; it's a PDF file, but its URL doesn't end in .pdf.
It does use Content-Disposition to specify a filename, but the release
version of Wget doesn't acknowledge those.

If you use the current development version of Wget, and specify -e
content_disposition=on, it will download. If you're willing to try
that, you'll need to look at
http://wget.addictivecode.org/RepositoryAccess for information on how to
get the current development version of Wget (you should use the 1.11
repository, not mainline), and special building requirements.

- --
HTHm
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHMRq97M8hyUobTrERCJ6WAJwK6uv/HlrLmTA7zK5DLZCnswkofQCfbMvJ
6yAiHoWEsXLohuYmQTGlPDo=
=DWHZ
-END PGP SIGNATURE-


Bugs! [Re: Question re server actions]

2007-11-06 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Alan Thomas wrote:
 Thanks.  I unzipped those binaries, but I still have a problem. . . .
 
 I changed the wget command to:
 
 wget --recursive --level=20 --append-output=wget_log.txt -econtent_dispositi
 on=on  --accept=pdf,doc,ppt,xls,zip,tar,gz  --no-parent --no-directories --d
 irectory-prefix=TEST_AnyLogic_Docs http://www.xjtek.com;
 
 However, the log file shows:
 
 --2007-11-06 21:33:55--  http://www.xjtek.com/
 Resolving www.xjtek.com... 207.228.227.14
 Connecting to www.xjtek.com|207.228.227.14|:80... connected.
 HTTP request sent, awaiting response... 200 OK
 Length: unspecified [text/html]
 --2007-11-06 21:34:11--  http://www.xjtek.com/
 Connecting to www.xjtek.com|207.228.227.14|:80... connected.
 HTTP request sent, awaiting response... 200 OK
 Length: unspecified [text/html]
 Saving to: `TEST_AnyLogic_Docs/index.html'
 
  0K ..  128K=0.08s
 
 2007-11-06 21:34:12 (128 KB/s) - `TEST_AnyLogic_Docs/index.html' saved
 [11091]
 
 Removing TEST_AnyLogic_Docs/index.html since it should be rejected.
 
 FINISHED --2007-11-06 21:34:12--
 Downloaded: 1 files, 11K in 0.08s (128 KB/s)
 
 The version of wget is shown as 1.10+devel.

Congratulations! Looks like you've discovered a bug! :\

And just in time, too, as we're expecting to release 1.11 any day now.

When I try your version with --debug, it looks like it thinks all the
links are trying to escape upwards: that is, it thinks that they
disobey your --no-parents. You should be able to remove the --no-parents
from your command-line, and it will work, as in your case there _are_ no
parents to traverse to, and the --no-parents is superfluous.

I also discovered (what I consider to be) a bug, in that

wget -e content_disposition=on --accept=pdf -r
http://www.xjtek.com/anylogic/license_agreement/

downloads the file to ./License_AnyLogic_6.x.x.pdf, rather than to
www.xjtek.com/file/114/License_AnyLogic_6.x.x.pdf (the dirname for which
matches its URL after redirection).

 Also, I`m not sure why - is required vice -- in front of the new
 option.

It's not a long option; it's the short option -e, followed by an
argument, content_disposition=on. There is not currently a long-option
version for this. Support for Content-Disposition will be enabled by
default in Wget 1.12, so a long-option probably won't be added (unless
it's to disable the support).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHMVTe7M8hyUobTrERCFC0AJ9cSLdrnQOD7I770y5yBLPpNer6ggCfcGMj
G6q+mYUI+oooD9xkHURxTVw=
=ApQs
-END PGP SIGNATURE-


Re: wget -o question

2007-10-01 Thread Steven M. Schweda
From: Micah Cowan

 But, since any specific transaction is unlikely to take such a long
 time, the spread of the run is easily deduced by the start and end
 times, and, in the unlikely event of multiple days, counting time
 regressions.

   And if the pages in books were all numbered 1, 2, 3, 4, 5, 6, 7, 8,
9, 0, 1, 2, 3, ..., the reader could easily deduce the actual number for
any page, but most folks find it more convenient when all the necessary
data are right there in one place.

   But hey.  You're the boss.

   SMS.


Re: wget -o question

2007-10-01 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Steven M. Schweda wrote:

 But, since any specific transaction is unlikely to take such a long
 time, the spread of the run is easily deduced by the start and end
 times, and, in the unlikely event of multiple days, counting time
 regressions.
 
And if the pages in books were all numbered 1, 2, 3, 4, 5, 6, 7, 8,
 9, 0, 1, 2, 3, ..., the reader could easily deduce the actual number for
 any page, but most folks find it more convenient when all the necessary
 data are right there in one place.

To my mind, books are much more likely to cross 10-page boundaries
several severals of times, than Wget is to cross more than just one
24-hour boundary. And, there's always date; wget; date...

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHAJch7M8hyUobTrERCKMDAKCFxnnZrB0vIrquoMi5x/F+32DlCwCcDWdP
3U+0+vCH1tXGCJ3pk9KR3xM=
=ZDLY
-END PGP SIGNATURE-


Re: wget -o question

2007-10-01 Thread Jim Wright
My usage is counter to your assumptions below.  I run every hour to
connect to 1,000 instruments (1,500 in 12 months) dispersed over the
entire western US and Alaska.  I append log messages for all runs from
a day to a single file.  This is an important debugging tool for us.
We have mostly VSAT and CDMA connections for remote instruments, but many
other variations.  Small bandwidth, large latency, and potentially large
backlogs of data means we can run for a couple days catching up with an
instrument - rare, but it happens.  The current timestamping is a PAIN for
us to automatically parse.  A change as proposed here is very simple, but
would be VERY useful.  Right now, we have 116 gigabytes of wget log files.

Jim


On Sun, 30 Sep 2007, Micah Cowan wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256
 
 Steven M. Schweda wrote:
  From: Micah Cowan
  
  -  tms = time_str (NULL);
  +  tms = datetime_str (NULL);
  
  Does anyone think there's any general usefulness for this sort of
  thing?
  
 I don't care much, but it seems like a fairly harmless change with
  some benefit.  Of course, I use an OS where a directory listing which
  shows date and time does so using a consistent and constant format,
  independent of the age of a file, so I may be biased.
 
 :)
 
 Though honestly, what this change buys you above simply doing date;
 wget, I don't know. I think maybe I won't bother, at least for now.
 
  Though if I were considering such a change, I'd probably just have wget
  mention the date at the start of its run, rather than repeat it for each
  transaction. Obviously wouldn't be a high-priority change... :)
  
 That sounds reasonable, except for a job which begins shortly before
  midnight.
 
 I considered this, along with the unlikely 24-hour wget run.
 
 But, since any specific transaction is unlikely to take such a long
 time, the spread of the run is easily deduced by the start and end
 times, and, in the unlikely event of multiple days, counting time
 regressions.
 
 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer...
 http://micah.cowan.name/
 
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
 iD8DBQFHAIP67M8hyUobTrERCFFIAJ9Pltuwqr0FeOtlwuFPotKxoBa6TgCeKb2l
 dtRfakFDQ47qcUJJFKXPVwY=
 =t50d
 -END PGP SIGNATURE-
 


Re: wget -o question

2007-10-01 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Jim Wright wrote:
 My usage is counter to your assumptions below.[...]
 A change as proposed here is very simple, but
 would be VERY useful.

Okay. Guess I'm sold, then. :D

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHAKcq7M8hyUobTrERCCxhAKCPbzNRHGkVbZTcaEBlI7xNqroJbACeKSYO
kdixUTJro4Pp3CszOYdjfHE=
=NaSh
-END PGP SIGNATURE-


Re: wget -o question

2007-10-01 Thread Saso Tomat
Micah Cowan micah at cowan.name writes:

 
 
 Jim Wright wrote:
  My usage is counter to your assumptions below.[...]
  A change as proposed here is very simple, but
  would be VERY useful.
 
 Okay. Guess I'm sold, then. :D
 
 --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer...
 http://micah.cowan.name/
 
 


Thank you all for your replies. Yes, it is very needed. I use wget on WIN OS. 
I have a .cmd file that performs wget for several days/weeks/months if needed 
so the date information is very usefull.

Thank you.
Saso



Re: wget -o question

2007-09-30 Thread Steven M. Schweda
From: Micah Cowan

  -  tms = time_str (NULL);
  +  tms = datetime_str (NULL);

 Does anyone think there's any general usefulness for this sort of
 thing?

   I don't care much, but it seems like a fairly harmless change with
some benefit.  Of course, I use an OS where a directory listing which
shows date and time does so using a consistent and constant format,
independent of the age of a file, so I may be biased.

 Though if I were considering such a change, I'd probably just have wget
 mention the date at the start of its run, rather than repeat it for each
 transaction. Obviously wouldn't be a high-priority change... :)

   That sounds reasonable, except for a job which begins shortly before
midnight.  I'd say that it makes more sense to do it the same way every
time.  Otherwise, why bother displaying the hour every time, when it
changes so seldom?  Or the minute?  Eleven bytes more per file in the
log doesn't seem to me to be a big price to pay for consistent
simplicity.  Or you could let the victim specify a strptime() format
string, and satisfy everyone.  Personally, I'd just change time_str() to
datetime_str() in a couple of places.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: wget -o question

2007-09-30 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Steven M. Schweda wrote:
 From: Micah Cowan
 
 -  tms = time_str (NULL);
 +  tms = datetime_str (NULL);
 
 Does anyone think there's any general usefulness for this sort of
 thing?
 
I don't care much, but it seems like a fairly harmless change with
 some benefit.  Of course, I use an OS where a directory listing which
 shows date and time does so using a consistent and constant format,
 independent of the age of a file, so I may be biased.

:)

Though honestly, what this change buys you above simply doing date;
wget, I don't know. I think maybe I won't bother, at least for now.

 Though if I were considering such a change, I'd probably just have wget
 mention the date at the start of its run, rather than repeat it for each
 transaction. Obviously wouldn't be a high-priority change... :)
 
That sounds reasonable, except for a job which begins shortly before
 midnight.

I considered this, along with the unlikely 24-hour wget run.

But, since any specific transaction is unlikely to take such a long
time, the spread of the run is easily deduced by the start and end
times, and, in the unlikely event of multiple days, counting time
regressions.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHAIP67M8hyUobTrERCFFIAJ9Pltuwqr0FeOtlwuFPotKxoBa6TgCeKb2l
dtRfakFDQ47qcUJJFKXPVwY=
=t50d
-END PGP SIGNATURE-


wget -o question

2007-09-27 Thread Saso Tomat
Hi all,

I have a question regarding the -o switch:

currently I see that log file contains timestamp ONLY. Is it possible to tell 
wget to include date too?

Thank you.
Saso



Question

2007-08-07 Thread Andra Isan
Hi All, 
   
  I am wondering if there is a way that I can download pdf files and organize 
them in a directory with Wget or should I write a code for that?
   
  If I need to write a code for that, would you please let me know if there is 
any sample code available?
   
  Thanks in advance 

   
-
Shape Yahoo! in your own image.  Join our Network Research Panel today!

Re: Question

2007-08-07 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Andra Isan wrote:
 I am wondering if there is a way that I can download pdf files and
 organize them in a directory with Wget or should I write a code for that?
  
 If I need to write a code for that, would you please let me know if
 there is any sample code available?

Hello Andra,

I don't think your request is very clear. Certainly you can download PDF
files with Wget. What do you mean by organize them in a directory?
What sort of organization do you want? Please be as specific as possible.

- --
Thanks,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGuLVB7M8hyUobTrERCCsbAJ9zIhS8o930RFIQZOil+xFol4pj3gCcDEzw
dvOkSyrG+VAstrI8bOr+Nks=
=0l7Q
-END PGP SIGNATURE-


Re: Question

2007-08-07 Thread Andra Isan
I have a paper proceeding and I want to follow a link of that proceeding and go 
to a paper link, then follow the paper link and go to author link and then 
follow author link which leads to all the paper that the author has written. I 
want to place all these pdf files( papers of one author) into a directory. So, 
at the end I have directories of all authors containing papers that those 
authors have written. (one directory for each author) 
I am not sure if I can do it with Wget or not. 
   
  Please let me know your idea 
  Thanks in advance 



Micah Cowan [EMAIL PROTECTED] wrote:
  -BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Andra Isan wrote:
 I am wondering if there is a way that I can download pdf files and
 organize them in a directory with Wget or should I write a code for that?
 
 If I need to write a code for that, would you please let me know if
 there is any sample code available?

Hello Andra,

I don't think your request is very clear. Certainly you can download PDF
files with Wget. What do you mean by organize them in a directory?
What sort of organization do you want? Please be as specific as possible.

- --
Thanks,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGuLVB7M8hyUobTrERCCsbAJ9zIhS8o930RFIQZOil+xFol4pj3gCcDEzw
dvOkSyrG+VAstrI8bOr+Nks=
=0l7Q
-END PGP SIGNATURE-


   
-
Pinpoint customers who are looking for what you sell. 

Re: Question

2007-08-07 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

It seems to me that you can simply start a recursive,
non-parent-traversing fetch (-r -np) of the page with the links, and
you'll end up with the PDF files you want (plus anything else linked to
on that page). If the PDF files are stored in different directories on
the website, they'll be in different directories in the fetch;
otherwise, they won't be, and yeah you'd need to write some script to do
what you need (sorry, no samples available).

- -Micah

Andra Isan wrote:
 I have a paper proceeding and I want to follow a link of that proceeding
 and go to a paper link, then follow the paper link and go to author link
 and then follow author link which leads to all the paper that the author
 has written. I want to place all these pdf files( papers of one author)
 into a directory. So, at the end I have directories of all authors
 containing papers that those authors have written. (one directory for
 each author) 
 I am not sure if I can do it with Wget or not.
 
 */Micah Cowan [EMAIL PROTECTED]/* wrote:
 
 I don't think your request is very clear. Certainly you can download PDF
 files with Wget. What do you mean by organize them in a directory?
 What sort of organization do you want? Please be as specific as
 possible.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGuL097M8hyUobTrERCKqYAJ9/tC05b5+NI2xacmJfNqzQnzZEdgCfY+m7
UbasnhbVBKEk13w82PcJO6Q=
=TeLr
-END PGP SIGNATURE-


Re: Question about the frame

2007-07-02 Thread Ben Galin


On Jun 26, 2007, at 11:50 PM, Micah Cowan wrote:


After running

  $ wget -H -k -p http://www.fdoxnews.com/

It downloaded all of the relevant files. However, the results were  
still

not viewable until I edited the link in www.fdoxnews.com/index.html,
replacing the ? with %3F (index.mas%3Fepl=...). Probably, wget
should have done that when converting the links, considering that it
named the file with a ?, but left it literally in the converted  
link; ?
is a special character for URIs, and cannot be part of filenames  
unless

they are encoded. I'll make note of that in my buglist.


It appears that this is actually by design.  If -E (--html-extension)  
is not specified, `?' will not be replaced with `%3F'.  From src/ 
convert.c:


   We quote ? as %3F to avoid passing part of the file name as the
   parameter when browsing the converted file through HTTP.  However,
   it is safe to do this only when `--html-extension' is turned on.
   This is because converting index.html?foo=bar to
   index.html%3Ffoo=bar would break local browsing, as the latter
   isn't even recognized as an HTML file!  However, converting
   index.html?foo=bar.html to index.html%3Ffoo=bar.html should be
   safe for both local and HTTP-served browsing.

Running

   $ wget -E -H -k -p http://www.fdoxnews.com/

does the right thing.

-Ben



Re: Question about the frame

2007-07-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Ben Galin wrote:
 
 On Jun 26, 2007, at 11:50 PM, Micah Cowan wrote:
 
 After running

   $ wget -H -k -p http://www.fdoxnews.com/

 It downloaded all of the relevant files. However, the results were still
 not viewable until I edited the link in www.fdoxnews.com/index.html,
 replacing the ? with %3F (index.mas%3Fepl=...). Probably, wget
 should have done that when converting the links, considering that it
 named the file with a ?, but left it literally in the converted link; ?
 is a special character for URIs, and cannot be part of filenames unless
 they are encoded. I'll make note of that in my buglist.
 
 It appears that this is actually by design.  If -E (--html-extension) is
 not specified, `?' will not be replaced with `%3F'.  From src/convert.c:
 
We quote ? as %3F to avoid passing part of the file name as the
parameter when browsing the converted file through HTTP.  However,
it is safe to do this only when `--html-extension' is turned on.
This is because converting index.html?foo=bar to
index.html%3Ffoo=bar would break local browsing, as the latter
isn't even recognized as an HTML file!  However, converting
index.html?foo=bar.html to index.html%3Ffoo=bar.html should be
safe for both local and HTTP-served browsing.
 
 Running
 
$ wget -E -H -k -p http://www.fdoxnews.com/
 
 does the right thing.

Okay, I'll remove that item, then. Thanks very much for looking into
that, Ben!

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGiYs37M8hyUobTrERCL+eAKCCh/FYkwvEJuHlPokRT0CfZBMJLwCeNMya
8KL3KkjWZFagsyLqHo+1g2A=
=kPXe
-END PGP SIGNATURE-


Question about the frame

2007-06-26 Thread Mishari Al-Mishari

Hi,
I am using the following command:
wget -p url
the url has frames.
the url retrieves a page that has set of frames. But wget doesn't
retrieve the html pages of the frames urls. Is there any bug or i am
missing something?

Also the command
wget -r -l 2 url
(url has frames) the above command doesn't retrieve the html pages of
the urls in the frames.

Does wget has any problem with the frames in the html page?

I am using version 1.10.2.

Note that I am not subscribing to the maling list group, so please
include my email in the cc

Thanks!

-mish


Re: Question about the frame

2007-06-26 Thread Micah Cowan
Mishari Al-Mishari wrote:
 Hi,
 I am using the following command:
 wget -p url
 the url has frames.
 the url retrieves a page that has set of frames. But wget doesn't
 retrieve the html pages of the frames urls. Is there any bug or i am
 missing something?

Works fine for me. In fact, if the frames have frames, it'll get those
too. How many nested frames have you?

 Also the command
 wget -r -l 2 url
 (url has frames) the above command doesn't retrieve the html pages of
 the urls in the frames.

These two examples strongly suggest that you have a large (2) number of
nested frames. wget will only recurse two levels of page-prerequisites
with the -p option.

However, if you can't be specific about the URL you're trying, we can't
be specific about what's going on.

I'd recommend you use the -d (debug) option, redirect the log to a file
(-o wget-log), and check the log for a string like: Not descending
further (if you're running it in an English locale), which is a good
way to tell if it's run into more nested frames than it is willing to
pursue.

 I am using version 1.10.2.

Me too. :)

-- 
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/



RE: Question on wget upload/dload usage

2007-06-18 Thread Tony Lewis
Joe Kopra wrote:

 

 The wget statement looks like:

 

 wget --post-file=serverdata.mup -o postlog -O survey.html

   http://www14.software.ibm.com/webapp/set2/mds/mds

 

--post-file does not work the way you want it to; it expects a text file
that contains something like this:

a=1b=2

 

and it sends that raw text to the server in a POST request using a
Content-Type of application/x-www-form-urlencoded. If you run it with -d,
you will see something like this:

 

POST /someurl HTTP/1.0

User-Agent: Wget/1.10

Accept: */*

Host: www.exelana.com

Connection: Keep-Alive

Content-Type: application/x-www-form-urlencoded

Content-Length: 7

 

---request end---

[writing POST file data ... done]

 

To post a file as an argument, you need a Content-Type of
multipart/form-data, which wget does not currently support.

 

Tony



RE: simple wget question

2007-05-13 Thread Willener, Pat
This is something that is not supported by the http protocol.
If you access the site via ftp://..., then you can use wildcards like *.pdf 

-Original Message-
From: R Kimber [mailto:[EMAIL PROTECTED] 
Sent: Saturday, May 12, 2007 06:43
To: wget@sunsite.dk
Subject: Re: simple wget question

On Thu, 10 May 2007 16:04:41 -0500 (CDT)
Steven M. Schweda wrote:

 From: R Kimber
 
  Yes there's a web page.  I usually know what I want.
 
There's a difference between knowing what you want and being able
 to describe what you want so that it makes sense to someone who does
 not know what you want.

Well I was wondering if wget had a way of allowing me to specify it.

  But won't a recursive get get more than just those files? Indeed,
  won't it get everything at that level? The accept/reject options
  seem to assume you know what's there and can list them to exclude
  them.  I only know what I want. [...]
 
Are you trying to say that you have a list of URLs, and would like
 to use one wget command for all instead of one wget command per URL? 
 Around here:
 
 ALP $ wget -h
 GNU Wget 1.10.2c, a non-interactive network retriever.
 Usage: alp$dka0:[utility]wget.exe;13 [OPTION]... [URL]...
 [...]
 
 That [URL]... was supposed to suggest that you can supply more than
 one URL on the command line.  Subject to possible command-line length
 limitations, this should allow any number of URLs to be specified at
 once.
 
There's also -1 (--input-file=FILE).  No bets, but it looks as
 if you can specify - for FILE, and it'll read the URLs from stdin,
 so you could pipe them in from anything.

Thanks, but my point is I don't know the full URL, just the pattern.

What I'm trying to download is what I might express as:

http://www.stirling.gov.uk/*.pdf

but I guess that's not possible.  I just wondered if it was possible
for wget to filter out everything except *.pdf - i.e. wget would look
at a site, or a directory on a site, and just accept those files that
match a pattern.

- Richard
-- 
Richard Kimber
http://www.psr.keele.ac.uk/


RE: simple wget question

2007-05-13 Thread Willener, Pat
Sorry, I didn't see that Steven has already answered the question. 

-Original Message-
From: Steven M. Schweda [mailto:[EMAIL PROTECTED] 
Sent: Saturday, May 12, 2007 10:05
To: WGET@sunsite.dk
Cc: [EMAIL PROTECTED]
Subject: Re: simple wget question

From: R Kimber

 What I'm trying to download is what I might express as:
 
 http://www.stirling.gov.uk/*.pdf

   At last.

 but I guess that's not possible.

   In general, it's not.  FTP servers often support wildcards.  HTTP
servers do not.  Generally, an HTTP server will not give you a list of
all its files the way an FTP server often will, which is why I asked (so
long ago) If there's a Web page which has links to all of them, [...].

   I just wondered if it was possible
 for wget to filter out everything except *.pdf - i.e. wget would look
 at a site, or a directory on a site, and just accept those files that
 match a pattern.

   Wget has options for this, as suggested before (wget -h):

[...]
Recursive accept/reject:
  -A,  --accept=LIST   comma-separated list of accepted extensions.
  -R,  --reject=LIST   comma-separated list of rejected extensions.
[...]

but, like many of us, it's not psychic.  It needs explict URLs or else
instructions (-r) to follow links which it sees in the pages it sucks
down.  If you don't have a list of the URLs you want, and you don't have
URLs for one or more Web pages which contain links to the items you
want, then you're probably out of luck.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: simple wget question

2007-05-11 Thread R Kimber
On Thu, 10 May 2007 16:04:41 -0500 (CDT)
Steven M. Schweda wrote:

 From: R Kimber
 
  Yes there's a web page.  I usually know what I want.
 
There's a difference between knowing what you want and being able
 to describe what you want so that it makes sense to someone who does
 not know what you want.

Well I was wondering if wget had a way of allowing me to specify it.

  But won't a recursive get get more than just those files? Indeed,
  won't it get everything at that level? The accept/reject options
  seem to assume you know what's there and can list them to exclude
  them.  I only know what I want. [...]
 
Are you trying to say that you have a list of URLs, and would like
 to use one wget command for all instead of one wget command per URL? 
 Around here:
 
 ALP $ wget -h
 GNU Wget 1.10.2c, a non-interactive network retriever.
 Usage: alp$dka0:[utility]wget.exe;13 [OPTION]... [URL]...
 [...]
 
 That [URL]... was supposed to suggest that you can supply more than
 one URL on the command line.  Subject to possible command-line length
 limitations, this should allow any number of URLs to be specified at
 once.
 
There's also -1 (--input-file=FILE).  No bets, but it looks as
 if you can specify - for FILE, and it'll read the URLs from stdin,
 so you could pipe them in from anything.

Thanks, but my point is I don't know the full URL, just the pattern.

What I'm trying to download is what I might express as:

http://www.stirling.gov.uk/*.pdf

but I guess that's not possible.  I just wondered if it was possible
for wget to filter out everything except *.pdf - i.e. wget would look
at a site, or a directory on a site, and just accept those files that
match a pattern.

- Richard
-- 
Richard Kimber
http://www.psr.keele.ac.uk/


Re: simple wget question

2007-05-11 Thread Steven M. Schweda
From: R Kimber

 What I'm trying to download is what I might express as:
 
 http://www.stirling.gov.uk/*.pdf

   At last.

 but I guess that's not possible.

   In general, it's not.  FTP servers often support wildcards.  HTTP
servers do not.  Generally, an HTTP server will not give you a list of
all its files the way an FTP server often will, which is why I asked (so
long ago) If there's a Web page which has links to all of them, [...].

   I just wondered if it was possible
 for wget to filter out everything except *.pdf - i.e. wget would look
 at a site, or a directory on a site, and just accept those files that
 match a pattern.

   Wget has options for this, as suggested before (wget -h):

[...]
Recursive accept/reject:
  -A,  --accept=LIST   comma-separated list of accepted extensions.
  -R,  --reject=LIST   comma-separated list of rejected extensions.
[...]

but, like many of us, it's not psychic.  It needs explict URLs or else
instructions (-r) to follow links which it sees in the pages it sucks
down.  If you don't have a list of the URLs you want, and you don't have
URLs for one or more Web pages which contain links to the items you
want, then you're probably out of luck.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: simple wget question

2007-05-10 Thread R Kimber
On Sun, 6 May 2007 21:44:16 -0500 (CDT)
Steven M. Schweda wrote:

 From: R Kimber
 
  If I have a series of files such as
  
  http://www.stirling.gov.uk/elections07abcd.pdf
  http://www.stirling.gov.uk/elections07efg.pdf
  http://www.stirling.gov.uk/elections07gfead.pdf
   
  etc
  
  is there a single wget command that would download them all, or
  would I need to do each one separately?
 
It depends.  As usual, it might help to know your wget version and
 operating system, but in this case, a more immediate mystery would be
 what you mean by them all, and how one would know which such files
 exist.

GNU Wget 1.10.2, Ubuntu 7.04

If there's a Web page which has links to all of them, then you
 could use a recursive download starting with that page.  Look through
 the output from wget -h, paying particular attention to the sections
 Recursive download and Recursive accept/reject.  If there's no
 such Web page, then how would wget be able to divine the existence of
 these files?

Yes there's a web page.  I usually know what I want.

But won't a recursive get get more than just those files? Indeed, won't
it get everything at that level? The accept/reject options seem to
assume you know what's there and can list them to exclude them.  I only
know what I want.  Not necessarily what I don't want. I did look at the
man page, and came to the tentative conclusion that there wasn't a
way (or at least an efficient way) of doing it, which is why I asked
the question.

- Richard
-- 
Richard Kimber
http://www.psr.keele.ac.uk/


simple wget question

2007-05-06 Thread R Kimber

If I have a series of files such as

http://www.stirling.gov.uk/elections07abcd.pdf
http://www.stirling.gov.uk/elections07efg.pdf
http://www.stirling.gov.uk/elections07gfead.pdf
 
etc

is there a single wget command that would download them all, or would I
need to do each one separately?

Thanks,

- Richard Kimber


Re: simple wget question

2007-05-06 Thread Steven M. Schweda
From: R Kimber

 If I have a series of files such as
 
 http://www.stirling.gov.uk/elections07abcd.pdf
 http://www.stirling.gov.uk/elections07efg.pdf
 http://www.stirling.gov.uk/elections07gfead.pdf
  
 etc
 
 is there a single wget command that would download them all, or would I
 need to do each one separately?

   It depends.  As usual, it might help to know your wget version and
operating system, but in this case, a more immediate mystery would be
what you mean by them all, and how one would know which such files
exist.

   If there's a Web page which has links to all of them, then you could
use a recursive download starting with that page.  Look through the
output from wget -h, paying particular attention to the sections
Recursive download and Recursive accept/reject.  If there's no such
Web page, then how would wget be able to divine the existence of these
files?

   If you're running something older than version 1.10.2, you might try
getting the current released version first.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Question re web link conversions

2007-03-13 Thread Alan Thomas
Steven,

I`m not trying to blame wget, but rather understand what is going on
and perhaps how to correct it.  I am using wget version 1.10.2 and Internet
Explorer 6.0.2800.1106 on Windows 98SE.  However, when I renamed the file,
this problem did not occur.  So, I think it was something to do with the
characters in the filename, which you mentioned.

Thanks, Alan

- Original Message - 
From: Steven M. Schweda [EMAIL PROTECTED]
To: WGET@sunsite.dk
Cc: [EMAIL PROTECTED]
Sent: Tuesday, March 13, 2007 1:23 AM
Subject: Re: Question re web link conversions


 From: Alan Thomas

As usual, wget without a version does not adequately describe the
 wget program you're using, Internet Explorer without a version does
 not adequately describe the Web browser you're using, and I can only
 assume that you're doing all this on some version or other of Windows.

It might help to know which of everything you're using.  (But it
 might not.)

Using GNU Wget 1.10.2c built on VMS Alpha V7.3-2 (wget -V), I had
 no such trouble with either a Mozilla or an old Netscape 3 browser.  (I
 did need to rename the resulting file to something with fewer exotic
 characters before I could get either browser to admit that the file
 existed, but it's hard to see how that could matter much.)

It's not obvious to me how any browser could invent a URL to which to
 go Back, so my first guess is operator error, but it's even less obvious
 to me how anything wget could do could cause this behavior, either.

You might try it with Firefox or any browser with no history which
 might confuse a Back button.  If there's a way to blame wget for this,
 I'll be amazed.  (That has happened before, however.)

 

Steven M. Schweda   [EMAIL PROTECTED]
382 South Warwick Street(+1) 651-699-9818
Saint Paul  MN  55105-2547



Question re web link conversions

2007-03-12 Thread Alan Thomas
I am using the wget command below to get a page from the U.S. Patent 
Office.  This works fine.  However, when I open the resulting local file with 
Internet Explorer (IE), click a link in the file (go to another web site) and 
the click Back, it goes back to the real web address (http:...) vice the local 
file (c:\program files\wget\patents\ . . .).
   Does this have something to do with how wget converts web links?  Is 
there something I should do differently with wget?
I`m not clear on why it would do this.  When I save this site directly 
from IE as an HTML file, it works fine.  (When I click back, it goes back to 
the local file.)   

Thanks, Alan 

wget --convert-links --directory-prefix=C:\Program Files\wget\patents 
--no-clobber 
http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2Sect2=HITOFFu=/netahtml/PTO/search-adv.htmlr=0p=1f=Sl=50Query=ttl/softwared=PG01



Re: Question re web link conversions

2007-03-12 Thread Steven M. Schweda
From: Alan Thomas

   As usual, wget without a version does not adequately describe the
wget program you're using, Internet Explorer without a version does
not adequately describe the Web browser you're using, and I can only
assume that you're doing all this on some version or other of Windows.

   It might help to know which of everything you're using.  (But it
might not.)

   Using GNU Wget 1.10.2c built on VMS Alpha V7.3-2 (wget -V), I had
no such trouble with either a Mozilla or an old Netscape 3 browser.  (I
did need to rename the resulting file to something with fewer exotic
characters before I could get either browser to admit that the file
existed, but it's hard to see how that could matter much.)

   It's not obvious to me how any browser could invent a URL to which to
go Back, so my first guess is operator error, but it's even less obvious
to me how anything wget could do could cause this behavior, either.

   You might try it with Firefox or any browser with no history which
might confuse a Back button.  If there's a way to blame wget for this,
I'll be amazed.  (That has happened before, however.)



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


RE: Newbie Question - DNS Failure

2007-01-22 Thread Terry Babbey
 I installed wget on a HP-UX box using the depot package.

Which depot package?  (Anyone can make a depot package.) 
Depot package came from
http://hpux.connect.org.uk/hppd/hpux/Gnu/wget-1.10.2/

Which wget version (wget -V)? 
1.10.2

Built how?
Installed using swinstall

Running on which HP-UX system type?
RP-5405

OS version?
HP-UX B.11.11

 Resolving www.lambton.on.ca... failed: host nor service provided, or
not
 known.

   First guess:  You have a DNS problem, not a wget problem.  Can any
other program on the system (Web browser, nslookup, ...) resolve names
any better?
Nslookup and ping work wonderfully. Sorry, I should have mentioned that
the first time.

   Second guess:  If DNS works for everyone else, I'd try building wget
(preferably a current version, 1.10.2) from the source, and see if that
makes any difference.  (Who knows what name resolver is linked in with
the program in the depot?)
Started to try that and got some error messages during the build. I may
need to re-investigate.

   Third guess:  Try the ITRC forum for HP-UX, but you'll probably need
more info than this there, too:

   http://forum1.itrc.hp.com/service/forums/familyhome.do?familyId=117
Thanks, I'll check.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547



Re: Newbie Question - DNS Failure

2007-01-22 Thread Steven M. Schweda
From: Terry Babbey

  Built how?
 Installed using swinstall

   How the depot contents were built probably matters more.

 Second guess:  If DNS works for everyone else, I'd try building wget
  (preferably a current version, 1.10.2) from the source, and see if that
  makes any difference.  [...]
 
 Started to try that and got some error messages during the build. I may
 need to re-investigate.

   As usual, it might help if you showed what you did, and what happened
when you did it.  Data like which compiler (and version) could also be
useful.

   On an HP-UX 11.23 Itanium system, starting with my VMS-compatible kit
(http://antinode.org/dec/sw/wget.html;, which shouldn't matter much
here), I seemed to have no problems building using the HP C compiler,
other than getting a bunch of warnings related to socket stuff, which
seem to be harmless.  (Built using CC=cc ./configure and make.)

td176 cc -V
cc: HP C/aC++ B3910B A.06.13 [Nov 27 2006]

And I see no obvious name resolution problems:

td176 ./wget http://www.lambton.on.ca
--23:42:04--  http://www.lambton.on.ca/
   = `index.html'
Resolving www.lambton.on.ca... 192.139.190.140
Connecting to www.lambton.on.ca|192.139.190.140|:80... failed: Connection refuse
d.

d176 ./wget -V
GNU Wget 1.10.2c built on hpux11.23.
[...]

   That's on an HP TestDrive system, which is behind a restrictive
firewall, which, I assume, explains the connection problem.  (At least
it got an IP address for the name.)  And it's not the same OS version,
and who knows which patches have been applied to either system?, and so
on.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Newbie Question - DNS Failure

2007-01-20 Thread Steven M. Schweda
From: Terry Babbey

 I installed wget on a HP-UX box using the depot package.

   Great.  Which depot package?  (Anyone can make a depot package.) 
Which wget version (wget -V)?  Built how?  Running on which HP-UX
system type?  OS version?

 Resolving www.lambton.on.ca... failed: host nor service provided, or not
 known.

   First guess:  You have a DNS problem, not a wget problem.  Can any
other program on the system (Web browser, nslookup, ...) resolve names
any better?

   Second guess:  If DNS works for everyone else, I'd try building wget
(preferably a current version, 1.10.2) from the source, and see if that
makes any difference.  (Who knows what name resolver is linked in with
the program in the depot?)

   Third guess:  Try the ITRC forum for HP-UX, but you'll probably need
more info than this there, too:

   http://forums1.itrc.hp.com/service/forums/familyhome.do?familyId=117



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Newbie Question - DNS Failure

2007-01-19 Thread Terry Babbey
I installed wget on a HP-UX box using the depot package.

 

Now when I run wget it will not resolve DNS queries.

 

wget http://192.139.190.140 http://192.139.190.140/  works.

wget http://www.lambton.on.ca http://www.lambton.on.ca/  fails with
the following error:

 

# wget http://www.lambton.on.ca

--17:21:22--  http://www.lambton.on.ca/

   = `index.html'

Resolving www.lambton.on.ca... failed: host nor service provided, or not
known.

 

Any help is appreciated.

 

Thanks,

Terry



Terry Babbey - Technical Support Specialist

Information  Educational Technology Department

Lambton College, Sarnia, Ontario, CANADA

 



Re: Question!

2006-11-07 Thread Assistenza Deltasys

At 2006-11-07 02:57, Yan Qing Chen wrote:

Hi wget,

I had found a problem when i try to mirror a ftp site use wget. i use it 
with -m -b prameters. some files will be recopy when every mirror time. i 
will how to config a mirror site?


Thanks  Best Regards,


Hi,
when modified date reported by web server headers is newer than the 
timestamp of your file, wget will retrieve again the page (this is correct, 
modified files have to be retrieved again), maybe this is the cause of your 
problem.


If web server will report a new modified date at every load of the page, 
even nothing has modified, this is a web server misconfiguration, or maybe 
intentional config, or maybe bad written dynamic pages (asp, php, etc.) who 
don't care about the issue.


HTH, Andrea




Question!

2006-11-06 Thread Yan Qing Chen

Hi wget,

I had found a problem when i try to
mirror a ftp site use wget. i use it with -m -b prameters.
some files will be recopy when every mirror time. i will how to config
a mirror site?

Thanks  Best Regards,

Yan Qing Chen(陈延庆)
Tivoli China Development(IBM CSDL)
Internet Email: [EMAIL PROTECTED]
Address: Haohai Building 3F, ShangDi 5th Street , HaiDian District, BEIJING,
100085, CHINA


Err, which list? [was: Re: wget question (connect multiple times)]

2006-10-18 Thread Morgan Read
Tony Lewis wrote:
 A) This is the list for reporting bugs. Questions should go to
 wget@sunsite.dk

Err, I posted Qs to wget@sunsite.dk and they come via this list - is there a
mix-up here?  Perhaps why I never get any answers;)

(If there's any one else listening to this list and holding back on giving me
some fantastic bit of info that'll make my life for ever better because this is
a bug list and not a question list - please feel free to email me off-list:)

M.
-- 
Morgan Read
NEW ZEALAND
mailto:mstuffATreadDOTorgDOTnz

fedora: Freedom Forever!
http://fedoraproject.org/wiki/Overview

By choosing not to ship any proprietary or binary drivers, Fedora does differ
from other distributions. ...
Quote: Max Spevik
   http://interviews.slashdot.org/article.pl?sid=06/08/17/177220



signature.asc
Description: OpenPGP digital signature


RE: Err, which list? [was: Re: wget question (connect multiple times)]

2006-10-18 Thread Willener, Pat
If you are not on the distribution list, you can read the archive at
http://www.mail-archive.com/wget@sunsite.dk/ 

-Original Message-
From: Morgan Read [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 18, 2006 4:23 PM
To: Tony Lewis
Cc: wget@sunsite.dk
Subject: Err, which list? [was: Re: wget question (connect multiple times)]

Tony Lewis wrote:
 A) This is the list for reporting bugs. Questions should go to
 wget@sunsite.dk

Err, I posted Qs to wget@sunsite.dk and they come via this list - is there a
mix-up here?  Perhaps why I never get any answers;)

(If there's any one else listening to this list and holding back on giving me
some fantastic bit of info that'll make my life for ever better because this is
a bug list and not a question list - please feel free to email me off-list:)

M.
-- 
Morgan Read
NEW ZEALAND
mailto:mstuffATreadDOTorgDOTnz

fedora: Freedom Forever!
http://fedoraproject.org/wiki/Overview

By choosing not to ship any proprietary or binary drivers, Fedora does differ
from other distributions. ...
Quote: Max Spevik
   http://interviews.slashdot.org/article.pl?sid=06/08/17/177220



Re: wget question (connect multiple times)

2006-10-18 Thread Hrvoje Niksic
Tony Lewis [EMAIL PROTECTED] writes:

 A) This is the list for reporting bugs. Questions should go to
 wget@sunsite.dk

For what it's worth, [EMAIL PROTECTED] is simply redirected to
[EMAIL PROTECTED]  It is still useful to have a separate address for
bug reports, for at least two reasons.  One, the mailing list could
theoretically move to another location; and two, at some point we
might decide to stop redirecting bug reports to the public mailing
list.  Neither of these is likely to happen any time soon, as far as I
know, though.


wget question (connect multiple times)

2006-10-17 Thread t u
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

hi,
I hope it is okay to drop a question here.

I recently found that if wget downloads one file, my download speed will
be Y, but if wget downloads two separate files (from the same server,
doesn't matter), the download speed for each of the files will be Y (so
my network speed will go up to 2 x Y).

So my question is, can I make wget download the same file multiple
times simultaneously? In a way, it would run as multiple processes and
download parts of the file at the same time, speeding up the download.

Hope I could explain my question, sorry about the bad english.

Thanks

PS. Please consider this as an enhancement request if wget cannot get a
file by downloading parts of it simultaneously.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2.2 (GNU/Linux)

iD8DBQFFNV4YLM1JzWwJYEYRAsEEAJ9FTx+hURJD5VudhbN2f7Iight3AACcDa6f
tO3WuBYygfKLA2Pis8Fbcos=
=7kNq
-END PGP SIGNATURE-


RE: wget question (connect multiple times)

2006-10-17 Thread Tony Lewis
A) This is the list for reporting bugs. Questions should go to
wget@sunsite.dk

B) wget does not support multiple time simultaneously

C) The decreased per-file download time you're seeing is (probably) because
wget is reusing its connection to the server to download the second file. It
takes some time to set up a connection to the server regardless of whether
you're downloading one byte or one gigabyte of data. For small files, the
set up time can be a significant part of the overall download time.

Hope that helps!

Tony
-Original Message-
From: t u [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 17, 2006 3:50 PM
To: [EMAIL PROTECTED]
Subject: wget question (connect multiple times)

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

hi,
I hope it is okay to drop a question here.

I recently found that if wget downloads one file, my download speed will be
Y, but if wget downloads two separate files (from the same server, doesn't
matter), the download speed for each of the files will be Y (so my network
speed will go up to 2 x Y).

So my question is, can I make wget download the same file multiple times
simultaneously? In a way, it would run as multiple processes and download
parts of the file at the same time, speeding up the download.

Hope I could explain my question, sorry about the bad english.

Thanks

PS. Please consider this as an enhancement request if wget cannot get a file
by downloading parts of it simultaneously.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2.2 (GNU/Linux)

iD8DBQFFNV4YLM1JzWwJYEYRAsEEAJ9FTx+hURJD5VudhbN2f7Iight3AACcDa6f
tO3WuBYygfKLA2Pis8Fbcos=
=7kNq
-END PGP SIGNATURE-



Re: wget question (connect multiple times)

2006-10-17 Thread t u
 -Original Message- From: t u [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, October 17, 2006 3:50 PM To: [EMAIL PROTECTED] Subject:
 wget question (connect multiple times)
 
 hi, I hope it is okay to drop a question here.
 
 I recently found that if wget downloads one file, my download speed
 will be Y, but if wget downloads two separate files (from the same
 server, doesn't matter), the download speed for each of the files
 will be Y (so my network speed will go up to 2 x Y).
 
 So my question is, can I make wget download the same file multiple
 times simultaneously? In a way, it would run as multiple processes
 and download parts of the file at the same time, speeding up the
 download.
 
 Hope I could explain my question, sorry about the bad english.
 
 Thanks
 
 PS. Please consider this as an enhancement request if wget cannot get
 a file by downloading parts of it simultaneously.

 Tony Lewis wrote:
 A) This is the list for reporting bugs. Questions should go to
 wget@sunsite.dk
 
 B) wget does not support multiple time simultaneously
 
 C) The decreased per-file download time you're seeing is (probably) because
 wget is reusing its connection to the server to download the second file. It
 takes some time to set up a connection to the server regardless of whether
 you're downloading one byte or one gigabyte of data. For small files, the
 set up time can be a significant part of the overall download time.
 
 Hope that helps!
 
 Tony

Just to make sure this is received as a feature request because of (B),
it would be nice to enable wget to download parts of files
simultaneously as multiple processes. Example:
wget --option-to-multiple-download=3 file.ext
wget1 dowloads the first 1/3 of the file
wget2 downloads the second 1/3 of the file
wget3 downloads the third 1/3 of the file.

Thanks for the reply, it helped.
sincerely.

PS. As a response to (A), I sent my message to bug-wget because I wanted
this to be considered as a feature request if it wasn't already
implemented. Also, I did not see wget@sunsite.dk at
http://www.gnu.org/software/wget/index.html#mailinglists. It only lists
bug-wget and wget-patches.


RE: wget question (connect multiple times)

2006-10-17 Thread Doug Kaufman
On Tue, 17 Oct 2006, Tony Lewis wrote:

 A) This is the list for reporting bugs. Questions should go to
 wget@sunsite.dk

I had always understood that bug-wget was just an alias for the
regular wget mailing list. Has this changed recently?
 Doug

-- 
Doug Kaufman
Internet: [EMAIL PROTECTED]



Question / Suggestion for wget

2006-10-13 Thread Mitch Silverstein

If -O output file and -N are both specified, it seems like there should be some 
mode where
the tests for noclobber apply to the output file, not the filename that exists 
on the remote machine.

So, if I run
# wget -N http://www.gnu.org/graphics/gnu-head-banner.png -O foo
and then
# wget -N http://www.gnu.org/graphics/gnu-head-banner.png -O foo
the second wget would not clobber and re-get the file.

Similarly, it seems odd that
# wget http://www.gnu.org/graphics/gnu-head-banner.png
and then
# wget -N http://www.gnu.org/graphics/gnu-head-banner.png -O foo
refuses to write the file named foo.

I realize there are already lots of options and the interactions can be pretty 
confusing, but I think
what I'm asking for would be of general usefulness. Maybe I'm sadistic, but -NO 
amuses me as a why to
turn on this behavior. Perhaps just --no-clobber-output-document would be saner.

Thanks for your consideration,
Mitch





Re: Question / Suggestion for wget

2006-10-13 Thread Steven M. Schweda
From: Mitch Silverstein

 If -O output file and -N are both specified [...]

   When -O foo is specified, it's not a suggestion for a file name to
be used later if needed.  Instead, wget opens the output file (foo)
before it does anything else.  Thus, it's always a newly created file,
and hence tends to be newer than any any file existing on any server
(whose date-time is set correctly).

   -O has its uses, but it makes no sense to combine it with -N. 
Remember, too, that wget allows more than one URL to be specified on a
command line, so multiple URLs may be associated with a single -O
output file.  What sense does -N make then?

   It might make some sense to create some positional option which would
allow a URL-specific output file, like, say, -OO, to be used so:

  wget http://a.b.c/d.e -OO not_dd.e http://g.h.i/j.k -OO not_j.k

but I don't know if the existing command-line parser could handle that. 
Alternatively, some other notation could be adopted, like, say,
file=URL, to be used so:

  wget not_dd.e=http://a.b.c/d.e not_j.k=http://g.h.i/j.k

   But that's not what -O does, and that's why you're (or your
expectations are) doomed.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


question with wget 1.10.2 for windows

2006-08-17 Thread Savage, Ken








Thx for the program first off. This might be a big help for
me.

What Im trying to do is pull .aspx pages off of a companies
website as .html files and save them locally. I also need the images and css to
be converted for local also.



I cant figure out the proper command to do this. Also
its starting at a file within a subfolder within the site example: http://www.example.com/press/release.aspx



Thanks for any help







Ken SavageWeb Developer Tel. 978-947-2888
Kronos Incorporated  297 Billerica Road  Chelmsford, MA 01824 



Experts at Improving
the Performance of People and Business

www.kronos.com










RE: question with wget 1.10.2 for windows

2006-08-17 Thread Sandhu, Ranjit



try wget -r -np http://www.example.com/press/release.aspx
and then write a script to change all the .aspx files 
extension to .html (you can't get the server side code BTW, only the 
html that is generated)

Ranjit Sandhu
SRA

From: Savage, Ken [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 17, 2006 3:46 PMTo: 
[EMAIL PROTECTED]Subject: question with wget 1.10.2 for 
windows


Thx for the program first off. This 
might be a big help for me.
What Im trying to do is pull .aspx 
pages off of a companies website as .html files and save them locally. I also 
need the images and css to be converted for local 
also.

I cant figure out the proper 
command to do this. Also its starting at a file within a subfolder within the 
site example: http://www.example.com/press/release.aspx

Thanks for any 
help



Ken 
SavageWeb 
Developer Tel. 978-947-2888Kronos Incorporated  
297 Billerica 
Road  Chelmsford, MA 01824 

Experts at Improving 
the Performance of People and Business
www.kronos.com



Suggestion/Question

2006-02-26 Thread Markus Raab

Hallo,

yesterday I encountered to wget and I find it a very useful program. I 
am mirroring a big site, more precious a forum. Because it is a forum 
under each post you have the action quote. Because that forum has 
20.000 post it would download all with action=quote, so I rejected it 
with R=*action=quote*. It works as in the manual documented, the files 
aren't stored, but they are downloaded anyway and deleted right after 
downloading. Why can't wget skip these files resp urls that would make 
downloading much faster and the site admin would also be happy because 
he has less traffic. If it has to do that wget must get these files to 
ensure that it doesn't forget anything downloading then a switch would 
be useful to turn this behaviour of manually, if the user knows that he 
doesn't need that an the deeper documents. In a forum e.g. it is 
absolutely clear that you can abdicate analysing this files because the 
won't link to any further documents.

Thanks for your answer.

Markus


Re: wget output question

2005-12-01 Thread Steven M. Schweda
 I do get the full Internet address in the download if I use -k or
 --convert-links, but not if I use it with -O 

   Ah.  Right you are.  Looks like a bug to me.  Wget/1.10.2a1 (VMS
Alpha V7.3-2) says this without -O:

08:53:42 (51.00 MB/s) - `index.html' saved [2674]

Converting index.html... 0-14
Converted 1 files in 0.232 seconds.

and this with -O:

08:54:06 (297.15 KB/s) - `test.html' saved [2674]

test.html: file currently locked by another user  [Sounds VMS-specific, yes?]
Converting test.html... nothing to do.
Converted 1 files in 0.039 seconds.


   The message from Wget 1.9.1a was less informative:

08:57:13 (297.11 KB/s) - `test.html' saved [2674]

: no such file or directory
Converting ... nothing to do.
Converted 1 files in 0.00 seconds.


   Without looking at the code, I'd say that someone is calling the
conversion code before closing the -O output file.  As a user could
specify multiple URLs with a single -O output file, it may be
difficult to make this work in the same way it would without -O, so a
normal download followed by a quick rename (mv) might be your best hope,
at least in the short term.



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547


Re: wget output question

2005-12-01 Thread Jon Berry

Steven M. Schweda wrote:


I do get the full Internet address in the download if I use -k or
--convert-links, but not if I use it with -O 
   



  Ah.  Right you are.  Looks like a bug to me.  


Is the developer available to confirm this?


  Without looking at the code, I'd say that someone is calling the
conversion code before closing the -O output file.  As a user could
specify multiple URLs with a single -O output file, it may be
difficult to make this work in the same way it would without -O, so a
normal download followed by a quick rename (mv) might be your best hope,
at least in the short term.
 


Yeah, that's the only thing I've been able to come up with as well.

FYI, I tried piping the output to another wget statement and also
redirecting to a file, but I pretty much ended up with the same result.

If anyone has another suggestion, I'm all ears.  :)

Jon


wget output question

2005-11-30 Thread Jon Berry

I'm trying to use wget to do the following:

1.  retrieve a single page
2.  convert the links in the retrieved page to their full, absolute 
addresses.

3.  save the page with a file name that I specify

I thought this would do it:

wget -k -O test.html http://www.google.com

However, it doesn't convert the links - it just saves the file as 
test.html.


What's the correct syntax to use?

Thanks,

Jon

wget version 1.9


Re: wget output question

2005-11-30 Thread Steven M. Schweda
 1.  retrieve a single page

   That worked.

 2. convert the links in the retrieved page to their full, absolute
 addresses.

   My wget -h output (Wget 1.10.2a1) says:
  -k,  --convert-links  make links in downloaded HTML point to local files.

Wget 1.9.1e says:

  -k,  --convert-links  convert non-relative links to relative.

Not anything about converting relative links to absolute.  I don't see
an option to do this automatically.

 3.  save the page with a file name that I specify

   That worked.  That's two out of three.

   Why would you want this result?



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547


Re: wget output question

2005-11-30 Thread Jon Berry

Steven M. Schweda wrote:


Not anything about converting relative links to absolute.  I don't see
an option to do this automatically.
 


From the wget man

page for --convert-links:

 ...if a linked file was downloaded, the link will refer to its local
 name; if it was not downloaded, the link will refer to its full
 Internet address rather than presenting a broken link...


I do get the full Internet address in the download if I use -k or 
--convert-links, but not if I use it with -O


 


3.  save the page with a file name that I specify
  
  Why would you want this result?
   


It's complicated, but the original file name is this long ass URL  that
contains multiple parameters which I don't need.  I just need a simple
filename like test.html.

I can probably write a script to rename the files, but I'm trying to 
understand

why wget won't allow me to do this.

Jon


Wget Log Question

2005-10-20 Thread dxu
I am trying to use Wget to get all the web pages of the IP Phones.
If I use default verbose log option, the log gives me too much unused
information:
wget -t 1 -i phones_104.txt -O test.txt -o log.txt
If I add -nv option, the log files looks fine:
20:14:23 URL:http://10.104.110.10/NetworkConfiguration [6458] -
output_104.txt [1]
20:14:23 URL:http://10.104.110.11/NetworkConfiguration [6458] -
output_104.txt [1]
..
But it only gives me the logs for registered phones's web pages, which means the
web pages could be opened. It does not give me the logs for unregistered
phones's web pages, those web pages could not be opened. So the following logs
are lost in the non-verbose case:
-23:21:40--  http://10.104.104.8/NetworkConfiguration
   = `test.txt'
Connecting to 10.104.104.8:80... failed: Connection timed out.
Giving up.

Those unregistered phone logs are very useful to me. Which options could give me
the non-verbose logs for those failed connections?

Thanks
Dennis








Re: A mirror question

2005-09-12 Thread Mauro Tortonesi
Alle 10:18, giovedì 1 settembre 2005, Pär-Ola Nilsson ha scritto:
 Hi!

 Is it possible to get wget to delete files that has disappeared at the
 remote ftp-host during --mirror?

not at the moment, but we might consider adding it to 2.0.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


wget Mailing List question

2005-08-26 Thread Jonathan
Would it be possible (and is anyone else interested) to have the subject 
line of messages posted to this list prefixed with '[wget]'?


I belong to several development mailing lists that utilize this feature so 
that distributed messages to not get removed by spam filters, or deleted by 
recipients because they have no idea who sent the message.


Often the subject line does not indicate that the message relates to wget. 
For example,  I almost deleted a message with the subject Honor --datadir 
because it looked like spam.  If the subject line read  [wget] 
Honor --datadir it would be much easier to deal with.


Is anyone else interested in this idea?  Is it feasible?

Jonathan 





Re: wget Mailing List question

2005-08-26 Thread Daniel Stenberg

On Fri, 26 Aug 2005, Jonathan wrote:

Would it be possible (and is anyone else interested) to have the subject 
line of messages posted to this list prefixed with '[wget]'?


Please don't. Subject real estate is precious and limited already is it is. I 
find subject prefixes highly distdurbing.


There is already plenty of info in the headers of each single mail to allow 
them to get filtered accurately without this being needed.


For example: X-Mailing-List: wget@sunsite.dk

--
 -=- Daniel Stenberg -=- http://daniel.haxx.se -=-
  ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol


Re: wget Mailing List question

2005-08-26 Thread Hrvoje Niksic
Jonathan [EMAIL PROTECTED] writes:

 Would it be possible (and is anyone else interested) to have the
 subject line of messages posted to this list prefixed with '[wget]'?

I am against munging subject lines of mail messages.  The mailing list
software provides headers such as `Mailing-List' and `X-Mailing-List'
which can be used for better and more reliable filtering.


Re: Question

2005-08-09 Thread Hrvoje Niksic
Mauro Tortonesi [EMAIL PROTECTED] writes:

 On Saturday 09 July 2005 10:34 am, Abdurrahman ÇARKACIOĞLU wrote:
 MS Internet Explorer can save a web page as a whole. That means all the
 images,

 Tables, can be saved as a file. It is called as Web Archieve, single file
 (*.mht).

 Does it possible for wget ?

 not at the moment, but it's a planned feature for wget 2.0.

Really?  I've never heard of a .mht web archive, it seems a
Windows-only thing.


Re: Question

2005-08-09 Thread Frank McCown
While the MHT format is not extremely popular yet, I'm betting it will 
continue to grow in popularity.  It encapsulates an entire web page and 
graphics, javascripts, style sheets, etc into a single text file.  This 
makes it much easier to email and store.


See RFC 2557 for more info:
http://www.faqs.org/rfcs/rfc2557.html

It is currently supported by Netscape and Mozilla Thunderbird.

Frank


Hrvoje Niksic wrote:

Mauro Tortonesi [EMAIL PROTECTED] writes:



On Saturday 09 July 2005 10:34 am, Abdurrahman ÇARKACIOĞLU wrote:


MS Internet Explorer can save a web page as a whole. That means all the
images,

Tables, can be saved as a file. It is called as Web Archieve, single file
(*.mht).

Does it possible for wget ?


not at the moment, but it's a planned feature for wget 2.0.



Really?  I've never heard of a .mht web archive, it seems a
Windows-only thing.


--
Frank McCown
Old Dominion University
http://www.cs.odu.edu/~fmccown


Re: Question

2005-08-09 Thread Mauro Tortonesi
On Tuesday 09 August 2005 04:37 am, Hrvoje Niksic wrote:
 Mauro Tortonesi [EMAIL PROTECTED] writes:
  On Saturday 09 July 2005 10:34 am, Abdurrahman ÇARKACIOĞLU wrote:
  MS Internet Explorer can save a web page as a whole. That means all the
  images,
 
  Tables, can be saved as a file. It is called as Web Archieve, single
  file (*.mht).
 
  Does it possible for wget ?
 
  not at the moment, but it's a planned feature for wget 2.0.

 Really?  I've never heard of a .mht web archive, it seems a
 Windows-only thing.

oops, my fault. i was in a hurry and i misunderstood what Abdurrahman was 
asking. what i wanted to say is that we talked about supporting the same html 
file download mode of firefox, in which you save all the related files in a 
directory with the same name of the document you donwloaded. i think that 
would be nice. sorry for the misunderstanding.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
Institute for Human  Machine Cognition  http://www.ihmc.us
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: Question

2005-08-09 Thread Hrvoje Niksic
Mauro Tortonesi [EMAIL PROTECTED] writes:

 oops, my fault. i was in a hurry and i misunderstood what
 Abdurrahman was asking. what i wanted to say is that we talked about
 supporting the same html file download mode of firefox, in which you
 save all the related files in a directory with the same name of the
 document you donwloaded. i think that would be nice. sorry for the
 misunderstanding.

No problem.  Once wget -r/-p is taught to parse links on the fly
instead of expecting to find them in fixed on-disk locations, writing
to MHT should be easy.  It seems to be a MIME-like format that builds
on the existing concept of multipart/related messages.

Instead of converting links to local files, we'd convert them to
identifiers (free-form strings) defined with content-id.


Re: Question

2005-08-08 Thread Mauro Tortonesi
On Saturday 09 July 2005 10:34 am, Abdurrahman ÇARKACIOĞLU wrote:
 MS Internet Explorer can save a web page as a whole. That means all the
 images,

 Tables, can be saved as a file. It is called as Web Archieve, single file
 (*.mht).

 Does it possible for wget ?

not at the moment, but it's a planned feature for wget 2.0.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
Institute for Human  Machine Cognition  http://www.ihmc.us
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Question

2005-07-09 Thread Abdurrahman ÇARKACIOĞLU









MS Internet Explorer can save a web page as a whole. That
means all the images,

Tables, can be saved as a file. It is called as Web
Archieve, single file (*.mht).



Does it possible for wget ?








wget Question/Suggestion

2005-05-20 Thread Mark Anderson
Is there an option, or could you add one if there isn't,
to specify that I want wget to write the downloaded html
file, or whatever, to stdout so I can pipe it into some
filters in a script?


Re: wget Question/Suggestion

2005-05-20 Thread Hrvoje Niksic
Mark Anderson [EMAIL PROTECTED] writes:

 Is there an option, or could you add one if there isn't, to specify
 that I want wget to write the downloaded html file, or whatever, to
 stdout so I can pipe it into some filters in a script?

Yes, use `-O -'.


question

2005-05-20 Thread Василевский Сергей
I use wget 1.9.1
In IE6.0 page load OK, 
but wget return (It's a bug or timeout or ...?)

16:59:59 (9.17 KB/s) - Read error at byte 31472 (Operation timed out).Retrying.

--16:59:59--  http://www.nirgos.com/d.htm
  (try: 2) = `/p5/poisk/spider/resource/www.nirgos.com/d.htm'
Connecting to www.nirgos.com[217.16.25.57]:80... connected.
HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable
17:00:02 ERROR 416: Requested Range Not Satisfiable.



Re: question

2005-05-20 Thread Hrvoje Niksic
  [EMAIL PROTECTED] writes:

 I use wget 1.9.1
 In IE6.0 page load OK, 
 but wget return (It's a bug or timeout or ...?)

Thanks for the report.  The reported timeout might or might not be
incorrect.  Wget 1.9.1 on Windows has a known bug of misrepresenting
error codes (this has been fixed in 1.10, which is now in beta).

The error reported by IE is presumably caused by Wget requesting an
impossible range, but it is hard to be sure without access to the
debug output.  If you can repeat this problem, please mail us the
output with the `-d' option.


Re: newbie question

2005-04-14 Thread Jens Rösner
Hi Alan!

As the URL starts with https, it is a secure server. 
You will need to log in to this server in order to download stuff.
See the manual for info how to do that (I have no experience with it).

Good luck
Jens (just another user)


  I am having trouble getting the files I want using a wildcard
 specifier (-A option = accept list).  The following command works fine to
get an
 individual file:
 
 wget

https://164.224.25.30/FY06.nsf/($reload)/85256F8A00606A1585256F900040A32F/$FILE/160RDTEN_FY06PB.pdf
 
 However, I cannot get all PDF files this command: 
 
 wget -A *.pdf

https://164.224.25.30/FY06.nsf/($reload)/85256F8A00606A1585256F900040A32F/$FILE/
 
 Instead, I get:
 
 Connecting to 164.224.25.30:443 . . . connected.
 HTTP request sent, awaiting response . . . 400 Bad Request
 15:57:52  ERROR 400: Bad Request.
 
I also tried this command without success:
 
 wget

https://164.224.25.30/FY06.nsf/($reload)/85256F8A00606A1585256F900040A32F/$FILE/*.pdf
 
 Instead, I get:
 
 HTTP request sent, awaiting response . . . 404 Bad Request
 15:57:52  ERROR 404: Bad Request.
 
  I read through the manual but am still having trouble.  What am I
 doing wrong?
 
 Thanks, Alan
 
 
 

-- 
+++ NEU: GMX DSL_Flatrate! Schon ab 14,99 EUR/Monat! +++

GMX Garantie: Surfen ohne Tempo-Limit! http://www.gmx.net/de/go/dsl


RE: newbie question

2005-04-14 Thread Tony Lewis
Alan Thomas wrote:

 I am having trouble getting the files I want using a wildcard specifier...

There are no options on the command line for what you're attempting to do.

Neither wget nor the server you're contacting understand *.pdf in a URI.
In the case of wget, it is designed to read web pages (HTML files) and then
collect a list of resources that are referenced in those pages, which it
then retrieves. In the case of the web server, it is designed to return
individual objects on request (X.pdf or Y.pdf, but not *.pdf). Some web
servers will return a list of files if you specify a directory, but you
already tried that in your first use case.

Try coming at this from a different direction. If you were going to manually
download every PDF from that directory, how would YOU figure out the names
of each one? Is there a web page that contains a list somewhere? If so,
point wget there.

Hope that helps.

Tony

PS) Jens was mistaken when he said that https requires you to log into the
server. Some servers may require authentication before returning information
over a secure (https) channel, but that is not a given.




Re: newbie question

2005-04-14 Thread Hrvoje Niksic
Alan Thomas [EMAIL PROTECTED] writes:

   I am having trouble getting the files I want using a wildcard
 specifier (-A option = accept list).  The following command works fine to
 get an individual file:
  
 wget
 https://164.224.25.30/FY06.nsf/($reload)/85256F8A00606A1585256F900040A32F/$FILE/160RDTEN_FY06PB.pdf
  
 However, I cannot get all PDF files this command:
  
 wget -A *.pdf
 https://164.224.25.30/FY06.nsf/($reload)/85256F8A00606A1585256F900040A32F/$FILE/
  
 Instead, I get:
  
 Connecting to 164.224.25.30:443 . . . connected.
 HTTP request sent, awaiting response . . . 400 Bad Request
 15:57:52  ERROR 400: Bad Request.

Does that URL work with a browser?  What version of Wget are you
using?

Using -d will provide a full log of what Wget is doing, as well as the
responses it is getting.  You can mail the log here, but please be
sure it doesn't contain sensitive information (if applicable).  This
list is public and has public archives.

Please note that you also need -r (or even better -r -l1) for -A
to work the way you want it.


Re: newbie question

2005-04-14 Thread Hrvoje Niksic
Tony Lewis [EMAIL PROTECTED] writes:

 PS) Jens was mistaken when he said that https requires you to log
 into the server. Some servers may require authentication before
 returning information over a secure (https) channel, but that is not
 a given.

That is true.  HTTPS provides encrypted communication between the
client and the server, but it doesn't always imply authentication.


Re: newbie question

2005-04-14 Thread Jens Rösner
Hi! 

Yes, I see now, I misread Alan's original post. 
I thought he would not even be able to download the single .pdf. 
Don't know why, as he clearly said it works getting a single pdf.

Sorry for the confusion! 
Jens

 Tony Lewis [EMAIL PROTECTED] writes:
 
  PS) Jens was mistaken when he said that https requires you to log
  into the server. Some servers may require authentication before
  returning information over a secure (https) channel, but that is not
  a given.
 
 That is true.  HTTPS provides encrypted communication between the
 client and the server, but it doesn't always imply authentication.
 

-- 
+++ GMX - Die erste Adresse für Mail, Message, More +++

1 GB Mailbox bereits in GMX FreeMail http://www.gmx.net/de/go/mail


Re: [unclassified] Re: newbie question

2005-04-14 Thread Alan Thomas
 I got the wgetgui program, and used it successfully.  The commands were
very much like this one.  Thanks, Alan

- Original Message - 
From: Technology Freak [EMAIL PROTECTED]
To: Alan Thomas [EMAIL PROTECTED]
Sent: Thursday, April 14, 2005 10:12 AM
Subject: [unclassified] Re: newbie question


 Alan,

 You could try something like this

 wget -r -d -l1 -H -t1 -nd -N -np -A pdf URL

 On Wed, 13 Apr 2005, Alan Thomas wrote:

  Date: Wed, 13 Apr 2005 16:02:40 -0400
  From: Alan Thomas [EMAIL PROTECTED]
  To: wget@sunsite.dk
  Subject: newbie question
 
  I am having trouble getting the files I want using a wildcard
specifier (-A option = accept list).  The following command works fine to
get an individual file:
 
  wget
https://164.224.25.30/FY06.nsf/($reload)/85256F8A00606A1585256F900040A32F/$FILE/160RDTEN_FY06PB.pdf
 
  However, I cannot get all PDF files this command:
 
  wget -A *.pdf
https://164.224.25.30/FY06.nsf/($reload)/85256F8A00606A1585256F900040A32F/$FILE/

 --- TekPhreak [EMAIL PROTECTED]
  http://www.tekphreak.com




--continue question

2005-02-10 Thread Schneider, Kenneth (MLCI)
Title: Message


	
		
i am using wget to 
retrieve files from a somewhat unstable ftp server. often i kill and restart 
wget with the --continue option. i use perl to manage the progress of wget and 
on bad days wget may be restarted 40, 50 or 60 times before the complete file is 
retrieved.

the problem is that 
sometimes the file ends up containing a section of garbage.

i am suspicious that 
data is not being flushed when i kill wget and/orwget continues on from 
the wrong spot.

does this behavior 
seem possible? can you recommend a fix or workaround?

i am using wget 
1.9.1 andrunningon windows xp.

thanks
--ken
		
			
		
		If you are not an intended recipient of this e-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute it. Click here for important additional terms relating to this e-mail. http://www.ml.com/email_terms/
		
			
		


wget: question about tag

2005-02-02 Thread Normand Savard
Hi,
I have a question about wget.  Is is possible to download other 
attribute value other than the harcoded ones?  For example I have the 
following html code:

...
applet name=RosaApplet archive=./rosa/rosa.jar code=Rosa2000 
width=400 height=300 MAYSCRIPT
param name=TB_POSITION value=right
param name=TB_ALIGN value=top
param name=IMG_URL value=/ms_tmp/1107184599245591.gif
param name=INP_FORM_NAME value=main

...
I want to retrieve the image under param name=IMG_URL.
Thanks.
Norm


RE: wget: question about tag

2005-02-02 Thread Tony Lewis
 
Normand Savard wrote:

 I have a question about wget.  Is is possible to download other attribute
 value other than the harcoded ones?

No, at least not in the existing versions of wget. I have not heard that
anyone is working on such an enhancement.




wget: question about tag

2005-01-31 Thread Normand Savard
Hi,
I have a question about wget.  Is is possible to download other 
attribute value other than the harcoded ones?  For example I have the 
following html code:

...
applet name=RosaApplet archive=./rosa/rosa.jar code=Rosa2000 width=400 
height=300 MAYSCRIPT
param name=TB_POSITION value=right
param name=TB_ALIGN value=top
param name=IMG_URL value=/ms_tmp/1107184599245591.gif
param name=INP_FORM_NAME value=main
...
I want to retrieve the image under param name=IMG_URL.
Thanks.
Norm


wget operational question

2004-10-01 Thread Jeff Holicky
Probably insane question but - is there a way with wget to download the
output (as text) and NOT the HTML code?
 
I have a site I want and they are BOLDING the first few letters - and I just
want the name without the html tags.  So  a straight text output would
suffice.
thanks
 
 
e.g. with explorer - I bring up the documentation for wget and then save as
text - it then saves a text file of the html - but no formatting no html
code etc etc
 
GNU Wget ManualGNU Wget
The noninteractive downloading utility
Updated for Wget 1.8.1, December 2001
by Hrvoje [EMAIL PROTECTED]'{c} and the developers
 
 
Table of Contents
Overview 
Invoking 
URL Format 
Option Syntax 
Basic Startup Options 




Re: wget operational question

2004-10-01 Thread Jim Wright
% wget -q -O - http://www.gnu.org/software/wget/manual/wget-1.8.1/html_mono/wget.html 
| html2text | head -15
** GNU Wget **
* The noninteractive downloading utility *
* Updated for Wget 1.8.1, December 2001 *
 by Hrvoje [EMAIL PROTECTED]'{c} and the developers
===
** Table of Contents **
* Overview
* Invoking
  o URL Format
  o Option Syntax
  o Basic Startup Options
  o Logging and Input File Options
  o Download Options
  o Directory Options
  o HTTP Options


Of course, sounds like you are using windows; no idea if any of this
will work there.

Jim



On Fri, 1 Oct 2004, Jeff Holicky wrote:

 Probably insane question but - is there a way with wget to download the
 output (as text) and NOT the HTML code?
  
 I have a site I want and they are BOLDING the first few letters - and I just
 want the name without the html tags.  So  a straight text output would
 suffice.
 thanks
  
  
 e.g. with explorer - I bring up the documentation for wget and then save as
 text - it then saves a text file of the html - but no formatting no html
 code etc etc
  
 GNU Wget ManualGNU Wget
 The noninteractive downloading utility
 Updated for Wget 1.8.1, December 2001
 by Hrvoje [EMAIL PROTECTED]'{c} and the developers
  
  
 Table of Contents
 Overview 
 Invoking 
 URL Format 
 Option Syntax 
 Basic Startup Options 
 
 
 


RE: wget operational question

2004-10-01 Thread Jeff Holicky
Thanks Jim.

Yes but the command line version.

I was thinking a few steps ahead I suppose.  The browser apps have an
htm2txt type converter built in - technically.  I was sort of thinking
wget should as well.  Was reviewing the docs of this plus cURL (am I allowed
to utter that 4 letter word here?)

FYI to all: (Windows tools - such as the one described)
http://users.erols.com/waynesof/bruce.htm

Thanks Jim - you brought me safely down to the lowest level of where I need
to be.

Cheers!
Jeff

(If we discuss GUI then yes the number of options is limited but in the
command line world there is typically plenty across all platforms)

-Original Message-
From: Jim Wright [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 01, 2004 07:23 PM
To: Jeff Holicky
Cc: [EMAIL PROTECTED]
Subject: Re: wget operational question

% wget -q -O -
http://www.gnu.org/software/wget/manual/wget-1.8.1/html_mono/wget.html |
html2text | head -15
** GNU Wget **
* The noninteractive downloading utility *
* Updated for Wget 1.8.1, December 2001 *
 by Hrvoje [EMAIL PROTECTED]'{c} and the developers

===
** Table of Contents **
* Overview
* Invoking
  o URL Format
  o Option Syntax
  o Basic Startup Options
  o Logging and Input File Options
  o Download Options
  o Directory Options
  o HTTP Options


Of course, sounds like you are using windows; no idea if any of this will
work there.

Jim



On Fri, 1 Oct 2004, Jeff Holicky wrote:

 Probably insane question but - is there a way with wget to download 
 the output (as text) and NOT the HTML code?
  
 I have a site I want and they are BOLDING the first few letters - and 
 I just want the name without the html tags.  So  a straight text 
 output would suffice.
 thanks
  
  
 e.g. with explorer - I bring up the documentation for wget and then 
 save as text - it then saves a text file of the html - but no 
 formatting no html code etc etc
  
 GNU Wget ManualGNU Wget
 The noninteractive downloading utility Updated for Wget 1.8.1, 
 December 2001 by Hrvoje [EMAIL PROTECTED]'{c} and the developers
  
  
 Table of Contents
 Overview
 Invoking
 URL Format
 Option Syntax
 Basic Startup Options
 
 
 




question on wget via http proxy

2004-07-12 Thread Malte Schünemann
Hello,

I am sitting behind a http proxy and need to access the internet through this channel. 
In most cases this works fine - but there are certain FTP server sites that I can only 
access via browser or wget. This also is no problem - as long as I need to retrieve 
data. 

Problems come up as soon as I need to upload data - this seems to be possible only via 
netscape 4. All tools that I used (including gftp, kbear, lftp) do not help out. E.g. 
using gftp I can access ftp.suse.com - but not these sites. As the browser is rahter 
unreliable in this respect I would like to use another tool.

Problem sites are 
  testcase.boulder.ibm.com
  ftp.software.ibm.com

Since wget is able to obtain directoy listings / retrieve data from there is should be 
possible to also upload data (the browser is able to as well). What is so special 
about wget that it is able to perform this task ? If I knew, maybe I could find a 
solution to this problem.

I am running LInux SuSE9.0, kernel 2.4.26, wget-1.8.2-301. I have set env variable 
  http_proxy
  ftp_proxy
which make the connection working fine with wget.

Any idea ?
Thank you

Malte

Verschicken Sie romantische, coole und witzige Bilder per SMS!
Jetzt neu bei WEB.DE FreeMail: http://freemail.web.de/?mc=021193



Re: question on wget via http proxy

2004-07-12 Thread Tony Lewis
Malte Schünemann wrote:

 Since wget is able to obtain directoy listings / retrieve data from
 there is should be possible to also upload data

Then it would be wput. :-)

 What is so special about wget that it is able to perform this task?

You can learn a LOT about how wget is communicating with the target site by
using the --debug argument.

Hope that helps a little.

Tony



question about wget use or (possible) new feature

2004-06-12 Thread Dog's Empire



 Hello,
Have tried to use wget to download 
forum pages.
But the point is wget download all such 
links
site.com/forum?topic=5way_to_show=1stway

site.com/forum?topic=5way_to_show=2ndway
and so on...
The point is that all these links have same 
contents, but different way to show it.
Is there any way, which can control 
"parametres" in link? To filter it and etc, something like --reject for 
filenames?
Thank you.
Best regards.Olga Lavhttp://www.dogsempire.com


Question: How do I get wget to get past a form based authenticati on?

2004-04-19 Thread Bettinger, Imelda
I am trying to use wget to spider our company web site to be able to save
copies of the site periodically.

We moved from web based authentication to form based last year and I can't
figure out how to get wget to get past the authenication. Most of our
content is behind the authentication. 

If wget won't work for my last, any other suggestions?


Thanks.


Imelda Bettinger
IT eTechnology
AMVESCAP PLC
* 713.214.4669 Direct
* [EMAIL PROTECTED]




-
Confidentiality Note:  The information contained in this message, and any attachments, 
may contain confidential and/or privileged material.  It is intended solely for the 
person or entity to which it is addressed.  Any review, retransmission, dissemination, 
or taking of any action in reliance upon this information by persons or entities other 
than the intended recipient is prohibited.  If you received this in error, please 
contact the sender and delete the material from any computer.


Re: Question: How do I get wget to get past a form based authenticati on?

2004-04-19 Thread Hrvoje Niksic
Bettinger, Imelda [EMAIL PROTECTED] writes:

 We moved from web based authentication to form based last year and I
 can't figure out how to get wget to get past the authenication. Most
 of our content is behind the authentication.

By form based authentication I assume you mean that you enter your
credentials in a web form, after which the browser session is
authenticated?

In that case, the authentication information is really carried by the
cookie.  To get Wget to send it, specify `--load-cookies' on the
cookie file exported by the browser, such as Mozilla's cookies.txt.

This is explained in the manual under the `--load-cookies' option.



  1   2   >