wget suggestion
There needs to be a way to tell wget to reject all domains EXCEPT those that are accepted. This should include subdomains. Ie. I just want to download www.mydomain.com and cache.mydomain.com. I thought the --domains option would work this way but it doesn't.
Re: wget suggestion
From: Robert La Ferla There needs to be a way to tell wget to reject all domains EXCEPT those that are accepted. This should include subdomains. Ie. I just want to download www.mydomain.com and cache.mydomain.com. I thought the --domains option would work this way but it doesn't. Can you provide any evidence that it doesn't? Useful info might include the wget version, your OS and version, the command you used, and the results you got. Adding -d to the command often reveals more than not using it. A real example is usually more useful than a fictional example. If you can't exhibit the actual failure and explain how to reproduce it, you might do better with a psychic hot-line, as most of us are not skilled in remote viewing. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: wget suggestion
GNU Wget 1.10.2 Capture this sub-site and not the rest of the site so that you can view it locally. i.e. just www.boston.com and cache.boston.com http://www.boston.com/ae/food/gallery/cheap_eats/ On May 3, 2007, at 10:34 PM, Steven M. Schweda wrote: From: Robert La Ferla There needs to be a way to tell wget to reject all domains EXCEPT those that are accepted. This should include subdomains. Ie. I just want to download www.mydomain.com and cache.mydomain.com. I thought the --domains option would work this way but it doesn't. Can you provide any evidence that it doesn't? Useful info might include the wget version, your OS and version, the command you used, and the results you got. Adding -d to the command often reveals more than not using it. A real example is usually more useful than a fictional example. If you can't exhibit the actual failure and explain how to reproduce it, you might do better with a psychic hot-line, as most of us are not skilled in remote viewing. -- -- Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: wget suggestion
From: Robert La Ferla GNU Wget 1.10.2 Ok. Running on what? Capture this sub-site and not the rest of the site so that you can view it locally. i.e. just www.boston.com and cache.boston.com http://www.boston.com/ae/food/gallery/cheap_eats/ What is a sub-site? Do you mean this page, or this page and all the pages to which it links, excluding off-site pages, or what? I have a better idea. Read this again: Can you provide any evidence that it doesn't? Useful info might include the wget version, your OS and version, the command you used, and the results you got. Adding -d to the command often reveals more than not using it. A real example is usually more useful than a fictional example. If you can't exhibit the actual failure and explain how to reproduce it, you might do better with a psychic hot-line, as most of us are not skilled in remote viewing. You might also consider phrasing your demands as polite requests in future. Phrases like I would like to learn how to, or Can you explain how to can be useful for this. Even better would be, I tried this command insert command here, and I got this result insert result here, but I was expecting something more like this insert expected result here, and I definitely didn't expect this insert undesirable result here. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
feature suggestion - make option to use system date instead multiple version number
Hi I use wget to download the same file in regular intervals (price list). And I caught myself renaming files programmatically (even in several projetcs) after they were downloaded by wget and named file.1 file.2 etc. Why I need this: one day I take the downloaded files from downlddir/ and move them into backupdir/. Two days later I move downlddir/* to backupdir/* again. They would overwrite those already in backupdir/ because wget restarted numbering when it found empty downlddir/ on day 1. I guess it would relatively easy and quite useful to add an option to name file.20070426142800 file.20070426142955 ... instead just numbers. Thank you, wget is excellent tool anyway. -- [EMAIL PROTECTED] SDF Public Access UNIX System - http://sdf.lonestar.org
Re: feature suggestion - make option to use system date instead multiple version number
From: Alvydas I guess it would relatively easy and quite useful to add an option to name file.20070426142800 file.20070426142955 ... instead just numbers. The relevant code is in src/utils.c: unique_name(), and should be easy enough to change. On a fast system, however, one-second resolution (or multiple users) could lead to non-unique names, so it would be wise to do something a little more like the existing code, but with a date-time string added in. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Suggestion: wget -r should fetch images declared in CSS
Hi there, wget -r is very useful to slurp and archive entire web sites, I use it all the time when working with other web designers remotely. However, images declared in CSS rules are ignored by the robot if they are not seen elsewhere in the HTML pages. Therefore, to mirror the site properly, such images must be fetched manually with separate commands after poking into each individual CSS file -- a cumbersome and error- prone process. As web designers increasingly rely on background images in CSS to cleanly separate presentation from content, wget ought to accomodate this feature. Happy Friday the 13th, JFG
Re: Feature suggestion for WGET
From: Daniel Clarke - JAS Worldwide I'd like to suggest a feature for WGET: the ability to download a file and then delete it afterwards. Assuming that you'd like to delete it on the FTP server, and not locally, the basics of this seem pretty easy to add: 0. Documentation. 1. Some kind of command-line option to control the new source-delete feature (or whatever you decide to call it). 2. src/ftp-basic.c: Add a new function, ftp_dele() (very nearly ftp_retr() converted to send DELE instead of RETR, and to expect a 2xx success response instead of a 1xx). 3. src/ftp.h: Add function prototype for ftp_dele(). 4. src/ftp.c: In getftp(), if ftp_retr() succeeds, and the new source-delete option is enabled, call the new ftp_dele(). 5. src/ftp.c: Add a bunch of new debug and error message code to deal with ftp_dele() activity and failures. I've done steps 2, 3, and 4 in my experimental code, and the basic functionality seems to be there. If anyone is eager to do the whole job and wants to see my rough code, just let me know. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Feature suggestion for WGET
Hi I'd like to suggest a feature for WGET: the ability to download a file and then delete it afterwards. At the moment I use this tool as part of a batch script that downloads all the waiting files from a remote server using wget, then quits wget, and used Windows's FTP.exe to delete all the files. This is not ideal, because if the process is interrupted, e.g. after downloading 99% of the files the connection is severed, I have to re-download all the files again. It also runs the risk of deleting new files placed on the server between getting the first and second file listing. It would be helpful if Wget can delete each file immediately after downloading it. Thanks Daniel Daniel Clarke MSci ARCS Senior Application Developer JAS Worldwide Management, LLC Global Headquarters, Atlanta, USA Cell/mobile: +1 4045181127 Office: +1 4042558230 ext 3023 *** note new office number [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Skype: jas-jasww-danielclarke callto:jas-jasww-danielclarke
wget - feature suggestion
First thanks for this great tool. Perhaps following features are helpfull (for me they are) --strict-level download data only from the given depthlevel -include should break -np. Momentarily wget don't accept the include-directory when it is an a higher level and -np is set. - filtering options for download-files. Minimum and maximium filesize would be very helpfull. Please cc to my adress. -- Feel free - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: www.gmx.net/de/go/mailfooter/topmail-out
wget - feature suggestion - md5/sha1 signatures on downloaded files
Hello, How about adding an option to display the md5 and/or sha1 signatures of files that wget downloads? These signatures can be calculated in real-time as each file is downloaded, and so would not require much extra I/O or cpu, but having the signatures shown right away would help people to verify files easily and quickly. Keep up the good work on wget! Alvin
Suggestion
Hello, as far as I can see, wget always prints the final data transfer speed in autodetected units. I think it would be useful (and I guess also simple to add) an option, which would tell wget to always print the speed in bytes per second (for example) so that it is always nicely parsable no matter what the data transfer speed range is. Or else it is necessary to parse also the K and the M characters and do some conditionals ... it's just not nice. Thanks, Nejc begin:vcard fn;quoted-printable:Nejc =C5=A0koberne n;quoted-printable:=C5=A0koberne;Nejc email;internet:[EMAIL PROTECTED] tel;fax:+38653810387 tel;cell:+38631883217 x-mozilla-html:FALSE version:2.1 end:vcard smime.p7s Description: S/MIME Cryptographic Signature
wget logging suggestion
I notice that the logging output that wget provides only includes a time stamp, but not date. So when using the -a for appending output to a log file, the time of execution is logged but you have no idea which dates it ran. Seems very odd. This applies to both -nv and -v options. -- Bruce Holm Lattice Semiconductor [EMAIL PROTECTED] --
Question / Suggestion for wget
If -O output file and -N are both specified, it seems like there should be some mode where the tests for noclobber apply to the output file, not the filename that exists on the remote machine. So, if I run # wget -N http://www.gnu.org/graphics/gnu-head-banner.png -O foo and then # wget -N http://www.gnu.org/graphics/gnu-head-banner.png -O foo the second wget would not clobber and re-get the file. Similarly, it seems odd that # wget http://www.gnu.org/graphics/gnu-head-banner.png and then # wget -N http://www.gnu.org/graphics/gnu-head-banner.png -O foo refuses to write the file named foo. I realize there are already lots of options and the interactions can be pretty confusing, but I think what I'm asking for would be of general usefulness. Maybe I'm sadistic, but -NO amuses me as a why to turn on this behavior. Perhaps just --no-clobber-output-document would be saner. Thanks for your consideration, Mitch
Re: Question / Suggestion for wget
From: Mitch Silverstein If -O output file and -N are both specified [...] When -O foo is specified, it's not a suggestion for a file name to be used later if needed. Instead, wget opens the output file (foo) before it does anything else. Thus, it's always a newly created file, and hence tends to be newer than any any file existing on any server (whose date-time is set correctly). -O has its uses, but it makes no sense to combine it with -N. Remember, too, that wget allows more than one URL to be specified on a command line, so multiple URLs may be associated with a single -O output file. What sense does -N make then? It might make some sense to create some positional option which would allow a URL-specific output file, like, say, -OO, to be used so: wget http://a.b.c/d.e -OO not_dd.e http://g.h.i/j.k -OO not_j.k but I don't know if the existing command-line parser could handle that. Alternatively, some other notation could be adopted, like, say, file=URL, to be used so: wget not_dd.e=http://a.b.c/d.e not_j.k=http://g.h.i/j.k But that's not what -O does, and that's why you're (or your expectations are) doomed. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Feature suggestion: change detection for wget -c
John McCabe-Dansted ha scritto: Wget has no way of verifying that the local file is really a valid prefix of the remote file Couldn't wget redownload the last 4 bytes (or so) of the file? For a few bytes per file we could detect changes to almost all compressed files and the majority of uncompressed files. reliable detection of changes in the resource to be downloaded would be a very interesting feature. but do you really think that checking the last X ( 100) bytes would be enough to be reasonably sure the resource was (not) modified? what about resources which are updated by appending information, such as log files? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Feature suggestion: change detection for wget -c
On 9/15/06, Mauro Tortonesi [EMAIL PROTECTED] wrote: reliable detection of changes in the resource to be downloaded would be a very interesting feature. but do you really think that checking the last X ( 100) bytes would be enough to be reasonably sure the resource was (not) modified? what about resources which are updated by appending information, such as log files? In terms of corruption prevention, wget -c is safe if the resources are updated only by appending. Two weaknesses I can think of are logs with fixed width repetitive messages, e.g. 12:05 Disks not mirrored 12:10 Disks not mirrored Then if we did a wget -c on the new log file 11:40 Disks not mirrored 11:45 Disks not mirrored 11:50 Disks not mirrored we would get an invalid log file. However I imagine most log files have at least a few variable length messages, so this technique would work on a majority of log files (well over 50%). Another weakness would be uncompressed database files... However I suspect that comparing the last 4 bytes would catch 90% of the real world snafus. I can't verify this without doing a survey of wget users, but I can say that this would have caught 100% of my own snafus. There are two problems common enough to be mentioned in the man page, proxies that append transfer interrupted to the end of failed downloads and inappropriate use of wget -c -r. Checking the last 4 bytes would catch ~100% of cases of transfer interrupted being appended. If wget acts recursively on a directory (wget -c -r) there are many more opportunities for corruption to be detected. -- John C. McCabe-Dansted PhD Student University of Western Australia
Feature suggestion: change detection for wget -c
Wget has no way of verifying that the local file is really a valid prefix of the remote file Couldn't wget redownload the last 4 bytes (or so) of the file? For a few bytes per file we could detect changes to almost all compressed files and the majority of uncompressed files. -- John C. McCabe-Dansted PhD Student University of Western Australia
Re: Suggestion
Kumar Varanasi ha scritto: Hello there, I am using WGET in my system to download http files. I see that there is no option to download the file faster with multiple connections to the server. Are you planning on a multi-threaded version of WGET to make downloads much faster? no, there is no plan to implement parallel download at the moment. however, please notice that it is highly unlikely that opening more than one connection with the same server will speed up the download process. parallel download makes sense only when more than one server is involved. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Suggestion for modification to wget
A lot of packages to build Linux distributions for the embedded world relies on "wget". Typically they are based on a Makefile and a configuration file included by the Makefile. If a package is to be built, then the package is downloaded to a directory somewhere. The makefile will try to extract the package from this directory first and if it does not exist it will try to download the package from an Internet site using wget. It is a quite common problem that the file does not exist, because a new version is available and the original file has been moved to another location. wget would be significantly improved if it would try one or more alternate locations for the package. Whenwget readsthe configuration file(s) ~/.wgetrc.xml this could contain information about alternate sites. Ideally, these files should contain information about *list of files containing associations between package names and where they can be found, either on ftp, http sites or on the local disk * Web sites from where new lists of associations can be downloaded. In addition there shouldbe the files containing the associations... In order to avoid having to rewrite a lot of scripts there should ideally not be a switch in the wget "download" command which indicates this. It is better if the configuration is retrieved and the fact it is there is enough for wget to try. Something like wget -O http://site.mydomain.com/retreive.php?file=package-version.tar.bz?try=4 | wget - would of course work if someone set up such a site, but a local solution is better. - It would not be a bad idea if wget could report all files at the failing site which are similar to the one requested. "wget --switch site/package-*.tar.gz" should give a list of all packages that look like package-*.tar.gz/tar.bz2 wget --spider somehow does this, but results in a lot of extra unneccessary info. I just like to get the filenames. I do not subscribe to the wget mailing list so pleasereply to "ulf at atmel dot com" Best Regards,Ulf Samuelsson
Suggestion/Question
Hallo, yesterday I encountered to wget and I find it a very useful program. I am mirroring a big site, more precious a forum. Because it is a forum under each post you have the action quote. Because that forum has 20.000 post it would download all with action=quote, so I rejected it with R=*action=quote*. It works as in the manual documented, the files aren't stored, but they are downloaded anyway and deleted right after downloading. Why can't wget skip these files resp urls that would make downloading much faster and the site admin would also be happy because he has less traffic. If it has to do that wget must get these files to ensure that it doesn't forget anything downloading then a switch would be useful to turn this behaviour of manually, if the user knows that he doesn't need that an the deeper documents. In a forum e.g. it is absolutely clear that you can abdicate analysing this files because the won't link to any further documents. Thanks for your answer. Markus
Suggestion for documentation
It may be useful to add a paragraph to the manual which lets users know they can use the --debug option to see why certain URLs are not followed (rejected) by wget. It would be especially useful to mention this in 9.1 Robot Exclusion. Something like this: If you wish to see which URLs are blocked by the robots.txt while wget is crawling, use the --debug option. You will see 2 lines that describe why the URL is being rejected: Rejecting path /abc/bar.html because of rule `/abc'. Not following http://foo.org/abc/bar.html because robots.txt forbids it. Thanks, Frank
A bug or suggestion
I saw that the option "-k, --convert-links" make the links on the root directory, not at the directory you down the pages. For example: if I download a page that the url is www.pageexample.com, the pages I download goes into there. But if i use that option, in the pages the links will link to the root directory. For example: if i download at /home and there is a link to www.pageexample.com/test/index.htm, the link must focus /home/www.pageexample.com/test/index.htm, but it focus /www.pageexample.com/test/index.htm. Well, I haven't tested it on Linux yet, but this problem occurs on cygwin (the root directory becomes the partition where the program is installed, like C:). Thak you for your attention, ConradoO difícil se faz agora... O impossível é apenas uma quesão de tempo.A prática leva à perfeição, exceto na roleta russa. Promoção Yahoo! Acesso Grátis: a cada hora navegada você acumula cupons e concorre a mais de 500 prêmios! Participe!
Re: Suggestion for manpage clarification (re --progress)
Bonjourno! :-) Sigh. Was hoping that someone who wrote the original man page format might already have expertise in that area. It's just arcana (obscure knowledge, not necessarily hard to learn or use, just not widely known). Are you saying that you wrote the original, but aren't familiar with the even more arcane tbl input language for tables?: Not that I or anyone would _expect_ knowledge of one or the other -- both are somewhat obscure source formats (even though widely used for manpages) these days...(*sigh*)...the sacrificing we make in the, not unappreciated, grandfathering of the old ways...:-) - Linda p.s. - I think this posting was meant to go to wget@sunsite.dk, as such, am responding to it there... Mauro Tortonesi wrote: Alle 22:03, sabato 27 agosto 2005, hai scritto: Being a computer geek, I tend to like things organized in tables so options stand out. I took the time to rewrite the text for the --progress section of the manpage, as it was always difficult for me to find the values and differences for the different subtags. Looking at the --progress=type, it doesn't quickly stand out what the possible values are nor that there are . I tended more toward a BNF type specification, but the central change is making the style types stand out. So even if you don't like the exact wording, I do think the table format presents the style options more clearly (i.e. they stand out quickly note, output of man was used as template for the changes, so this isn't directly applicable as a patch. I hope that isn't a block to the change, as it seems simple enough but I don't currently have a subversion source tree setup nor do I know manpage source syntax by memory (not a frequently used source language ;^) ): --progress=style Legal styles are bar[:force] and dot[:dotsize]. The bar style is used by default. It draws an ASCII progress bar graphics (a.k.a thermometer display) indicating the status of retrieval. If the output is not a TTY, the dot style will be used. To force bar usage when output is not a TTY, use the :force tag (i.e. --progress=bar:force ) The dot style traces traces the retrieval by printing dots on the screen, each dot representing a fixed amount of downloaded data. An optional dotsize tag can be specified to change the the amount of downloaded data per dot, grouping and line as follows (K = 1024 bytes; M = 1024KBytes ): size per dots per dotsize dot line group line - --- - default 1K 50K 10 50 binary 8K 384K 16 48 mega 64K 3M 8 48 default is used if no dotsize tag is specified. Note that you can set per user defaults using the progress command in .wgetrc. Note: specifying an option on the command line overrides .wgetrc settings. i like this change, but there is a small problem. all the documentation of wget is generated from the same texinfo sources, so in order to support tex documentation formats we'll have to include something like: @ifnottex your ascii graphics @end @iftex a real tex table @end in wget.texi. i've never used tex. anybody knows how to create tables in tex? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng. http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linux http://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
[Suggestion] Surpress host header?
This is no bug. But we encountered a situation where a server insists on accurate FQDN in the host header, or no header at all. When we have to access the server from outside the NAT firewall using port forwarding, wget cannot retrieve file. If there's an option to surpress host header all together, this problem can be solved. Yuan Liu
Suggestion for manpage clarification (re --progress)
Being a computer geek, I tend to like things organized in tables so options stand out. I took the time to rewrite the text for the --progress section of the manpage, as it was always difficult for me to find the values and differences for the different subtags. Looking at the --progress=type, it doesn't quickly stand out what the possible values are nor that there are . I tended more toward a BNF type specification, but the central change is making the style types stand out. So even if you don't like the exact wording, I do think the table format presents the style options more clearly (i.e. they stand out quickly note, output of man was used as template for the changes, so this isn't directly applicable as a patch. I hope that isn't a block to the change, as it seems simple enough but I don't currently have a subversion source tree setup nor do I know manpage source syntax by memory (not a frequently used source language ;^) ): --progress=style Legal styles are bar[:force] and dot[:dotsize]. The bar style is used by default. It draws an ASCII progress bar graphics (a.k.a thermometer display) indicating the status of retrieval. If the output is not a TTY, the dot style will be used. To force bar usage when output is not a TTY, use the :force tag (i.e. --progress=bar:force ) The dot style traces traces the retrieval by printing dots on the screen, each dot representing a fixed amount of downloaded data. An optional dotsize tag can be specified to change the the amount of downloaded data per dot, grouping and line as follows (K = 1024 bytes; M = 1024KBytes ): size per dots per dotsize dot linegroup line - --- - default 1K 50K 1050 binary 8K 384K 1648 mega64K3M 848 default is used if no dotsize tag is specified. Note that you can set per user defaults using the progress command in .wgetrc. Note: specifying an option on the command line overrides .wgetrc settings.
Re: A suggestion for configure.in
Hello, On Fri, Aug 26, 2005 at 02:07:16PM +0200, Hrvoje Niksic wrote: I've applied a slightly modified version of this patch, thanks. [...] I used elif instead. thank you, also for correcting my mistake. Actually, I wasn't aware about the fact that shell elif is portable. (I checked, and it appears in autoconf source several times, so it really is portable.) Thank you. Stepan
Re: A suggestion for configure.in
Stepan Kasal [EMAIL PROTECTED] writes: 1) I removed the AC_DEFINEs of symbols HAVE_GNUTLS, and HAVE_OPENSSL. AC_LIB_HAVE_LINKFLAGS defines HAVE_LIBGNUTLS and HAVE_LIBSSL, which can be used instead. wget.h was fixed to expect these symnbols. (You might think your defines are more aptly named, but they are used only once, in wget.h.) You're right. While I do prefer the old names, it's not that big a deal and it doesn't make sense to needlessly duplicate the defines. 2) Was it intentional that --without-ssl doesn't switch off OpenSSL autodetection? I hope it wasn't. Definitely not. 3) Explicit --with-ssl=gnutls should fail if libgnutls is not found. If the user explicitely asked for it, we shouldn't silently ignore the request if we cannot fulfill it. And likewise with ./configre --with-ssl. Agreed. (I know this is not common practice (yet), but I believe it's according to common sense. Wget 1.10 did this. The feature got lost when moving to AC_LIB_HAVE_LINKFLAGS.
A suggestion for configure.in
Hello, attached please find a patch with several suggestions. (I'm not sending it to wget-patches, as I'm not sure all the suggestions will be welcome.) 1) I removed the AC_DEFINEs of symbols HAVE_GNUTLS, and HAVE_OPENSSL. AC_LIB_HAVE_LINKFLAGS defines HAVE_LIBGNUTLS and HAVE_LIBSSL, which can be used instead. wget.h was fixed to expect these symnbols. (You might think your defines are more aptly named, but they are used only once, in wget.h.) 2) Was it intentional that --without-ssl doesn't switch off OpenSSL autodetection? I hope it wasn't. 3) Explicit --with-ssl=gnutls should fail if libgnutls is not found. If the user explicitely asked for it, we shouldn't silently ignore the request if we cannot fulfill it. And likewise with ./configre --with-ssl. (I know this is not common practice (yet), but I believe it's according to common sense. This is discussed in the CVS version of Autoconf manual, you can get it from savannah.) 4) A typo in a comment (a spare dnl). All these issues are resolved by the combined patch, attached to this mail. Please cc the replies to me, I'm not subscribed. Regards, Stepan Kasal Index: configure.in === --- configure.in(revision 2062) +++ configure.in(working copy) @@ -248,15 +248,15 @@ if test x$LIBGNUTLS != x then AC_MSG_NOTICE([compiling in support for SSL via GnuTLS]) -AC_DEFINE([HAVE_GNUTLS], 1, - [Define if support for the GnuTLS library is being compiled in.]) SSL_OBJ='gnutls.o' + else +AC_MSG_ERROR([--with-ssl=gnutls was given, but GNUTLS is not available.]) fi -else +else if test x$with_ssl != xno; then dnl As of this writing (OpenSSL 0.9.6), the libcrypto shared library dnl doesn't record its dependency on libdl, so we need to make sure dnl -ldl ends up in LIBS on systems that have it. Most OSes use - dnl dlopen(), but HP-UX uses dnl shl_load(). + dnl dlopen(), but HP-UX uses shl_load(). AC_CHECK_LIB(dl, dlopen, [], [ AC_CHECK_LIB(dl, shl_load) ]) @@ -274,9 +274,10 @@ if test x$LIBSSL != x then AC_MSG_NOTICE([compiling in support for SSL via OpenSSL]) -AC_DEFINE([HAVE_OPENSSL], 1, - [Define if support for the OpenSSL library is being compiled in.]) SSL_OBJ='openssl.o' + else if -n $with_ssl + then +AC_MSG_ERROR([--with-ssl was given, but OpenSSL is not available.]) fi fi Index: src/wget.h === --- src/wget.h (revision 2062) +++ src/wget.h (working copy) @@ -40,7 +40,8 @@ # define NDEBUG #endif -#if defined HAVE_OPENSSL || defined HAVE_GNUTLS +/* Is OpenSSL or GNUTLS available? */ +#if defined HAVE_LIBSSL || defined HAVE_LIBGNUTLS # define HAVE_SSL #endif
Re: Suggestion
Matthew J Harms [EMAIL PROTECTED] writes: I'm sure you've already had this suggested, and I don't know if it will work, due to the complexity of the suggestion, but is there a way you could implement the capability of wget to download any file that meets a criteria yet use wildcards (i.e. * or ?) to fill in the blanks. You can use wget -rl1 URL -A 200506*.exe. The problem is that you must have a URL that lists all the available files in HTML form. If you don't have such a URL, it's impossible to guess which files the server may contain. (Unlike FTP, HTTP doesn't support producing directory listings.) I'm not sure if wget even has the capability right now to do it? If the problem is what I described above, no generic downloading agent has the capability to do it.
Suggestion
Im sure youve already had this suggested, and I dont know if it will work, due to the complexity of the suggestion, but is there a way you could implement the capability of wget to download any file that meets a criteria yet use wildcards (i.e. * or ?) to fill in the blanks. For example, Im trying to download the latest Intelligent Updater antivirus definitions from Symantecs website, but I dont want to have to visit the site to figure out what the actual file name is, in order to download it. So if I were able to use wildcards I would put something like, wget http://definitions.symantec.com/defs/200506*.exe -N and leave it up to wget to get only the latest file from the server. Now I dont know if possible because I dont think, but I could be wrong, wget has the possibility to preview the sites and find out what files it does have, in order to download the latest. Or, even if it could download every file that were newer than the last one in the folder, wget will still download the file to specified folder. Ive tried to access the definitions site by itself, but there is nothing to be seen. Im not sure if wget even has the capability right now to do it? Ive tried previewing the help that comes with it, but there is nothing mentioning what Ive suggested within the help. Ill keep looking around on the inet to see if anyone else has figured it out. Thanks, Matt
wget Question/Suggestion
Is there an option, or could you add one if there isn't, to specify that I want wget to write the downloaded html file, or whatever, to stdout so I can pipe it into some filters in a script?
Re: wget Question/Suggestion
Mark Anderson [EMAIL PROTECTED] writes: Is there an option, or could you add one if there isn't, to specify that I want wget to write the downloaded html file, or whatever, to stdout so I can pipe it into some filters in a script? Yes, use `-O -'.
Re: suggestion
Stephen Leaf [EMAIL PROTECTED] writes: parameter option --stdout this option would print the file being downloaded directly to stdout. which would also mean that _only_ the file's content is printed. no errors, verbosity. usefulness? wget --stdout http://server.com/file.bz2 | bzcat file Note that you can emulate the proposed `--stdout' by specifying `-qO-'.
suggestion
parameter option --stdout this option would print the file being downloaded directly to stdout. which would also mean that _only_ the file's content is printed. no errors, verbosity. usefulness? wget --stdout http://server.com/file.bz2 | bzcat file
Suggestion regarding size
Hello all, Would it be possible to specify minimum size for files to retrieve? Please add me in the CC list of your replies as I'm not a subscriber. Thanks, Baptiste __ Do you Yahoo!? Yahoo! Mail - Easier than ever with enhanced search. Learn more. http://info.mail.yahoo.com/mail_250
Suggestion regarding size
Hello all, Would it be possible to specify minimum size for files to retrieve? Please add me in the CC list of your replies as I'm not a subscriber. Thanks, Baptiste __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Suggestion regarding size
Hello all, Would it be possible to specify minimum size for files to retrieve? Please add me in the CC list of your replies as I'm not a subscriber. Thanks, Baptiste __ Do you Yahoo!? Yahoo! Mail - Helps protect you from nasty viruses. http://promotions.yahoo.com/new_mail
suggestion for wget
hi there ::) the would be ok to have 2 or more downloads in the same time because some files are big and the host limits the speeed... thanks:) Sorin
Re: suggestion for wget
On Sat, Feb 05, 2005 at 02:04:26PM +0200, Sorin wrote: hi there ::) the would be ok to have 2 or more downloads in the same time because some files are big and the host limits the speeed... You could use a multithreaded download manager (example: d4x). Many of these packages use wget as a backend. You could also use the screen utility to run many wgets concurrently, or just background them in the current shell (but your screen will become a mess ... ) -- Ryan Underwood, [EMAIL PROTECTED]
Re: Suggestion, --range
Hello Robert, On Thursday, September 30, 2004 at 6:36:43 PM +0200, Robert Thomson wrote: It would be really advantageous if wget had a --range command line argument, that would download a range of bytes of a file, if the server supports it. You could try the feature patch posted by Rodrigo S. Wanderley last year on the wget mailing list. The guy made the work, and nobody gave feedback :-\. See [EMAIL PROTECTED]. Bye!Alain. -- When you want to reply to a mailing list, please avoid doing so from a digest. This often builds incorrect references and breaks threads.
Suggestion, --range
G'day, It would be really advantageous if wget had a --range command line argument, that would download a range of bytes of a file, if the server supports it. I've tried adding it with --header 'Range: bytes=from-to' but wget has a problem with the 206 return code, and I can't see a way around that on the command line. An alternative might be an --allow-returncode=206 option. ;) Downloading partial files is really useful when you have a small USB key and a large ISO. ;) Thanks, Rob.
Re: Suggestion to add an switch on timestamps
david-zhan [EMAIL PROTECTED] writes: WGET is popular FTP software for UNIX. But, after the files were downloaded for the first time, WGET always use the date and time, matching those on the remote server, for the downloaded files. If WGET is executed in temporary directory in which the files will be deleted according to the date of the files, the files, created seven days ago, will be deleted automatically once they are finish. I suggest that an option on timestamps can be added to WGET such that the users can use the current date and time for the newly downloaded files. Can't you simply use `touch *' to update the timestamps?
Suggestion to add an switch on timestamps
Suggestion to add an switch on timestamps Dear Sir/Madam: WGET is popular FTP software for UNIX. But, after the files were downloaded for the first time, WGET always use the date and time, matching those on the remote server, for the downloaded files. If WGET is executed in temporary directory in which the files will be deleted according to the date of the files, the files, created seven days ago, will be deleted automatically once they are finish. I suggest that an option on timestamps can be added to WGET such that the users can use the current date and time for the newly downloaded files. Thank you for kind attention.
doc suggestion
Please put in the wget docs, in at least 2 places The rc file used by wget under windows is actually wgetrc (no prefixed period), not .wgetrc. I could not find this info in the docs, and only figured it out by experimentation. Chuck -- __ Freezone Freeware: http://freezone.darksoft.co.nz http://chuckr.freeshell.org 1000+ programs in 40+ categories. Links to 500+ free Delphi controls in 20+ categories! Mirrors: http://www.bsdg.org/resources/ http://chuckr.bravepages.com http://groups.yahoo.com/group/DelphiOpenSource/files/Links/
wget Suggestion: ability to scan ports BESIDE #80, (like 443) Anyway Thanks for WGET!
Re: wget Suggestion: ability to scan ports BESIDE #80, (like 443) Anyway Thanks for WGET!
- Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, December 07, 2003 8:04 AM Subject: wget Suggestion: ability to scan ports BESIDE #80, (like 443) Anyway Thanks for WGET! What's wrong with wget https://www.somesite.com ?
suggestion: rethink retrial and abort behavior
hi, if I observerd correctly, wget behaves this way: errors are classified into two classes:. critical and non-critical errors. when a non critical error (eg time out) occurs wget retries continuing at the byte the last transmission stopped at. (if configured that way) if a critical error (like access denied, file not found) occured, wget stops. I now experienced for the very first time, wget OVERWRITING a partly retrieved file because of a bug. it was a pain in the ass, because I already had been waiting 1:30 hours to download the first half of 600 MB. Wget tried to continue, but the server answered with the remaining file size (I believe) , not the complete file size. So wget got confused and restarted. no, sorry I do not have the logfile any more. but i can get the link (it was a filefront download) This makes me think about another behaviour: - wget may be forced to always retry since yesterday and in the past I experiencend many false stops because of bad server or connection. probably it should delay a retry in the critical case. - if wget decides a continue is not possible due to server limitiation it should NOT delete the file but create a diff, if it seems appropriate (is the diff command able to work on binary files?) Jan
suggestion
it would be great if there was a flag that could be used with -q that would only give output if there was an error. i use wget a lot in pcs: johnjosephbachir.org/pcs thanks! john
Re: suggestion
is -nv (non-verbose) an improvement? $ wget -nv www.johnjosephbachir.org/ 12:50:57 URL:http://www.johnjosephbachir.org/ [3053/3053] - index.html [1] $ wget -nv www.johnjosephbachir.org/m http://www.johnjosephbachir.org/m: 12:51:02 ERROR 404: Not Found. but if you're not satisfied you could use shell redirection and the tail command: $ wget -nv www.johnjosephbachir.org/m 21 /dev/null | tail +2 you could use the return value of error to echo would ever you want. $ wget -q www.johnjosephbachir.org/m || echo Error Error On Fri, 12 Sep 2003, John Joseph Bachir wrote: it would be great if there was a flag that could be used with -q that would only give output if there was an error. i use wget a lot in pcs: johnjosephbachir.org/pcs thanks! john
Re: suggestion
great, thanks for the suggestions. yeah i am lookig for somethign that will be absolutely quiet when there is no error, but i have been using -nv in the meantime. john On Fri, 12 Sep 2003, Aaron S. Hawley wrote: |is -nv (non-verbose) an improvement? | |$ wget -nv www.johnjosephbachir.org/ |12:50:57 URL:http://www.johnjosephbachir.org/ [3053/3053] - index.html [1] |$ wget -nv www.johnjosephbachir.org/m |http://www.johnjosephbachir.org/m: |12:51:02 ERROR 404: Not Found. | |but if you're not satisfied you could use shell redirection and the tail |command: | |$ wget -nv www.johnjosephbachir.org/m 21 /dev/null | tail +2 | |you could use the return value of error to echo would ever you want. |$ wget -q www.johnjosephbachir.org/m || echo Error |Error | | |On Fri, 12 Sep 2003, John Joseph Bachir wrote: | | it would be great if there was a flag that could be used with -q that | would only give output if there was an error. | | i use wget a lot in pcs: | | johnjosephbachir.org/pcs | | thanks! | john |
suggestion
Dear Sirs, thanks for WGet, it's a great tool. I would very appreciate one more option: a possibility to get http page using POST method instead of GET. Cheers, Roman
Re: suggestion
it's available in the CVS version.. information at: http://www.gnu.org/software/wget/ On Tue, 17 Jun 2003, Roman Dusek wrote: Dear Sirs, thanks for WGet, it's a great tool. I would very appreciate one more option: a possibility to get http page using POST method instead of GET. Cheers, Roman -- Women do two-thirds of the work for five percent of the world's income.
feature suggestion -- download small files only
WGET could download only certains files based on file type. Usually the purporse is to avoid wasting time in those unrelated files. Accutally we don't care much on small files, such as 1K text files. Can we just limit the file size? For example, just take those less than 1M. Generally we get the real file size before the downloading starts, except files returned by CGI, which could give a wrong content-length. Just my idea. I don't know whether current version 1.8 has already implemneted it in some way. __ Do you Yahoo!? Yahoo! Tax Center - forms, calculators, tips, more http://taxes.yahoo.com/
A suggestion for `man wget'
Hello, This is not a bug, but could you please add in the manual, after the sentence The proxy is on by default if the appropriate environmental variable is defined. that this variable is called http_proxy. It is not easy to guess. Yours, U. Elias
suggestion
could you add to wget an option to force line feeds to be 0D0A ?
Re: [Req #1764] Suggestion: Anonymous rsync access to the wget CVS tree.
Rsync would be very convenient. You've got my vote on that one. Erlend Aasland On 09/23/02 10:01, Lars Chr. Hausmann wrote: Max == Max Bowsher [EMAIL PROTECTED] writes: Max As a dial-up user, I find it extremely useful to have access to Max the full range of cvs functionality whilst offline. Some other Max projects provide read-only rsync access to the CVS repository, Max which allows a local copy of the repository to be made, not just Max a checkout of a particular version. Max Since access to xemacs cvs on sunsite.dk is already provided in Max this manner, perhaps it would be possible for wget, as well? Wget - list do you want us to set this up ? /LCH -- SunSITE.dk Staff http://SunSITE.dk
Re: [Req #1764] Suggestion: Anonymous rsync access to the wget CVStree.
Max == Max Bowsher [EMAIL PROTECTED] writes: Max As a dial-up user, I find it extremely useful to have access to Max the full range of cvs functionality whilst offline. Some other Max projects provide read-only rsync access to the CVS repository, Max which allows a local copy of the repository to be made, not just Max a checkout of a particular version. Max Since access to xemacs cvs on sunsite.dk is already provided in Max this manner, perhaps it would be possible for wget, as well? Wget - list do you want us to set this up ? /LCH -- SunSITE.dk Staff http://SunSITE.dk
Suggestion: Anonymous rsync access to the wget CVS tree.
As a dial-up user, I find it extremely useful to have access to the full range of cvs functionality whilst offline. Some other projects provide read-only rsync access to the CVS repository, which allows a local copy of the repository to be made, not just a checkout of a particular version. Since access to xemacs cvs on sunsite.dk is already provided in this manner, perhaps it would be possible for wget, as well? Thanks, Max.
Suggestion: Anonymous rsync access to the CVS tree.
As a dial-up user, I find it extremely useful to have access to the full range of cvs functionality whilst offline. Some other projects provide read-only rsync access to the CVS repository, which allows a local copy of the repository to be made, not just a checkout of a particular version. Since access to xemacs cvs on sunsite.dk is already provided in this manner, perhaps it would be possible for wget, as well? Thankyou, Max.
Re: Suggestion
Hello Danny, Wednesday, July 17, 2002, 9:19:10 PM, you wrote: DL interrput the downloading of a certain file or even DL branch when downloading directory tree recursively. For one file Stop a download with Ctrl-C, and resume it with : wget -c http://pwet/file_you_were_downloading For a recursive download, -N option download only files if theyre newer or if they don't exist on current folder, but doesn't resume the big file of the tree you were downloading.. Current solution : wget -c http://pwet/tree/the_big_file and then wget -rN http://pwet/tree/ DL http://www.abcd.com/-index.html-docs.html-snapshots.html DL [0-4]? wget -r --max-depth=1 http://pwet/tree/ this will only retry for one level (only index.html for your example) -- Best regards, Cedmailto:[EMAIL PROTECTED] __ ifrance.com, l'email gratuit le plus complet de l'Internet ! vos emails depuis un navigateur, en POP3, sur Minitel, sur le WAP... http://www.ifrance.com/_reloc/email.emailif
Re: timestamping ( another suggestion)
DCA This isn't a bug, but the offer of a new feature. The timestamping DCA feature doesn't quite work for us, as we don't keep just the latest DCA view of a website and we don't want to copy all those files around for DCA each update. Which brings me to mention two features I've been meaning to suggest for ages. Probably it means changing some basic things in the core of wget, I don't know. I'm no programmer. Maybe it has been thought about already and was decided otherwise. But why does wget have to rename the last file it fetches when finding another one with the same name. Why isn't the previous file already there renamed to .1, .2 and so on if more files are present. IMO this would be a major advantage for mirroring sites with timestamping *and* keeping the old files (which may not be wanted to be discarded) *and* keep the links between newer and older unchanged files intact. Hm? The other thing more or less is ripped from the Windows DL-Manager FlashGet (but why not). Wouldn't it be useful if wget retrieves a file to a temporary renamed filename, for instance with the extension .wg! or something and renamed back to the original name after finishing? Two advantages IMO: First you can easily see at which point a download broke (so you don't have to look for a file by date or size or something in a whole lot of them). The other is the possibility to resume a broken download with the option -nc (so the already downloaded files aren't looked up again). Wget needn't check a lot and could determine by the file extension that this is the one file where it has to continue. Do I make sense? Sorry only raw ideas. -- Brix
Re[2]: timestamping ( another suggestion)
The other thing more or less is ripped from the Windows DL-Manager FlashGet (but why not). Wouldn't it be useful if wget retrieves a file to a temporary renamed filename, for instance with the extension .wg! or something and renamed back to the original name after finishing? Two TL advantages IMO: First you can easily see at which point a download broke (so you don't have to look for a file by date or size or something in a whole lot of them). The other is the possibility to resume a broken download with the option -nc (so the already downloaded files aren't looked up again). Wget needn't check a lot and could determine by the file extension that this is the one file where it has to continue. TL wget needs to remember a LOT more than simply the last file that was being TL downloaded. It needs to remember all the files it has looked at, the files TL that have been downloaded, the files that are in the queue to be downloaded, TL the command line and .wgetrc options, etc. TL With some clever planning by someone who knows the internals of the program TL really well, it might be possible for wget to create a resumption file with TL the state of the download, but I'm guessing that is a huge task. Well, I said I don't know what it takes and if it makes sense programming-wise. And actually I thought it wasn't about wget getting to remember more. If it creates a resumption file then it no-clobbers all the complete downloads (no remembering) when the broken download has to be repeated, doesn't find the current incompleted one (because of the extension), starts to download (again with resumption extension), finds there is one when it tries write and decides to continue for that file at the right point. Well, the conventional way of finding the broken file, deleting it and start again with -nc works too, of course. :-) -- Brix
Re: New suggestion.
On Monday 08 April 2002 19:18, you wrote: Ivan Buttinoni [EMAIL PROTECTED] writes: Again I send a suggestion, this time quite easy. I hope it's not allready implemented, else I'm sorry in advance. It will be nice if wget can use the regexp to evaluate what accept/refuse to download. The regexp have to work on whole URL and/or filename and/or hostname and/or CGI argument. Sometime I found the apache directory sorting links that are unusefull, eg: .../?N=A .../?M=D Here follows an hipotesis for the above example: wget -r -l0 --reg-exclude '[A-Z]=[AD]$' http:// The problem with regexps is that their use would make Wget dependent on a regexp library. To make matters worse, regexp libraries come in all shapes and sizes, with incompatible APIs and implementing incompatible dialects of regexps. I'm staying away from regexps as long as I possibly can. Ok, exist a lot of implementation regexp as a consequence exist a lot of implementations/dialets, but don't forget _gnu rexep_ (http://www.gnu.org/directory/rx.html)! And how difficult is insert regexps at compile time? (ex. ./configure --with-gnuregexp )? Ciao Ivan -- = BWARE TECHNOLOGIES - http://www.bware.it/ Via S.Gregorio, 3, Milano 20124 Italy - Phone: +39 02 2779181 Fax: +39 02 27791828 GSM: +39 335 1280432 =
Re: New suggestion.
Ivan Buttinoni [EMAIL PROTECTED] writes: Again I send a suggestion, this time quite easy. I hope it's not allready implemented, else I'm sorry in advance. It will be nice if wget can use the regexp to evaluate what accept/refuse to download. The regexp have to work on whole URL and/or filename and/or hostname and/or CGI argument. Sometime I found the apache directory sorting links that are unusefull, eg: .../?N=A .../?M=D Here follows an hipotesis for the above example: wget -r -l0 --reg-exclude '[A-Z]=[AD]$' http:// The problem with regexps is that their use would make Wget dependent on a regexp library. To make matters worse, regexp libraries come in all shapes and sizes, with incompatible APIs and implementing incompatible dialects of regexps. I'm staying away from regexps as long as I possibly can.
Re: [Feature suggestion] SMIL support
On Tue, Mar 19, 2002 at 12:06:48AM +0100, Fabrice Bauzac wrote: Maybe there is an easy way of saying hey, SMIL files are like HTML to wget? There's an option to set the recognized tag set for html docs. Maybe some trickery with that, plus --force-html, might do the trick. -- AlanE When the going gets tough, the weird turn pro. - HST
Re: -H suggestion
[EMAIL PROTECTED] writes: Funny you mention this. When I first heard about -p (1.7?) I thought exactly that it would default to [spanning hosts to retrieve page requisites]. I think it would be really useful if the page requisites could be wherever they want. I mean, -p is already ignoring -np (since 1.8?), what I think is also very useful. Since 1.8.1. I considered it a bit more dangerous to allow downloading from just any host if the user has not allowed it explicitly. For example, maybe the user doesn't want to load the banner ads? Or maybe he does? In either way, I was presented with a user interface problem. I couldn't quite figure out how to arrange the options to allow for three cases: * -p gets stuff from this host only, including requisites. * -p gets stuff from this host only, but requisites may span hosts. * everything may span hosts. Fred's suggestion raises the bar, because to implement it we'd need a set of options to juggle with the different download depths depending on whether you're referring to the starting host or to the other hosts. The -i switch provides for a file listing the URLs to be downloaded. Please provide for a list file for URLs to be avoided when -H is enabled. URLs to be avoided? Given that a URL can be named in more than one way, this might be hard to do. Sorry, but does --reject-host (or similar, I don't have the docs here ATM) not exactly do this? The existing rejection switches reject on the basis of host name, and on the basis of file name. There is no switch to disallow downloading a specific URL.
Re: -H suggestion
Hi! Once again I think this has nothing to do in the bug list, but, there you go: I've toyed with the idea of making a flag to allow `-p' span hosts even when normal download doesn't. Funny you mention this. When I first heard about -p (1.7?) I thought exactly that it would default to that behaviour. I think it would be really useful if the page requisites could be wherever they want. I mean, -p is already ignoring -np (since 1.8?), what I think is also very useful. The -i switch provides for a file listing the URLs to be downloaded. Please provide for a list file for URLs to be avoided when -H is enabled. URLs to be avoided? Given that a URL can be named in more than one way, this might be hard to do. Sorry, but does --reject-host (or similar, I don't have the docs here ATM) not exactly do this? I may well be missing the point here. But with disallowing hosts and dirs you should be able to do this. Or is the problem to load the lists from an external file? Then, please ignore my comment, I have no experience in this. CU Jens -- GMX - Die Kommunikationsplattform im Internet. http://www.gmx.net
-H suggestion
WGET suggestion The -H switch/option sets host-spanning. Please provide a way to specify a different limit on recursion levels for files retrieved from foreign hosts. -r -l0 -H2 for example would allow unlimited recursion levels on the target host, but only 2 [addtional] levels when a file is being retrieved from a foreign host. Second suggestion: The -i switch provides for a file listing the URLs to be downloaded. Please provide for a list file for URLs to be avoided when -H is enabled. Thanks for listening. And thanks for a marvelous product. Fred Holmes [EMAIL PROTECTED]
Suggestion on job size
It would be nice to have some way to limit the total size of any job, and have it exit gracefully upon reaching that size, by completing the -k -K process upon termination, so that what one has downloaded is useful. A switch that would set the total size of all downloads --total-size=600MB would terminate the run when the total bytes downloaded reached 600 MB, and process the -k -K. What one had already downloaded would then be properly linked for viewing. Probably more difficult would be a way of terminating the run manually (Ctrl-break??), but then being able to run the -k -K process on the already-downloaded files. Fred Holmes
Re: Suggestion on job size
Hi Fred! First, I think this would rather belong in the normal wget list, as I cannot see a bug here. Sorry to the bug tracers, I am posting to the normal wget List and cc-ing Fred, hope that is ok. To your first request: -Q (Quota) should do precisely what you want. I used it with -k and it worked very well. Or am I missing your point here? Your second wish is AFAIK not possible now. Maybe in the future wget could write the record of downloaded files in the appropriate directory. After exiting wget, this file could then be used to process all the files mentioned in it. Just an idea, I would normally not think that this option is an often requested one. HOWEVER: -K works (when I understand it correctly) on the fly, as it decides on the run, if the server file is newer, if a previously converted file exists and what to do. So, only -k would work after the download, right? CU Jens http://www.JensRoesner.de/wgetgui/ It would be nice to have some way to limit the total size of any job, and have it exit gracefully upon reaching that size, by completing the -k -K process upon termination, so that what one has downloaded is useful. A switch that would set the total size of all downloads --total-size=600MB would terminate the run when the total bytes downloaded reached 600 MB, and process the -k -K. What one had already downloaded would then be properly linked for viewing. Probably more difficult would be a way of terminating the run manually (Ctrl-break??), but then being able to run the -k -K process on the already-downloaded files. Fred Holmes
suggestion
I'm using wget for a watcher script that I run to monitor some servers and was thinking that it'd be handy to be able to have the http response code (200, 404, etc) as the return value on exit. Currently having it return 0 for ok and 1 for not ok is fine, but I can see some instances in the future where I might want to have the http response code instead. Anyway, just a thought, some assembly required, batteries not include, your mileage may vary. --mikej -=- mike jackson [EMAIL PROTECTED]
wget suggestion
Hi Just a suggestion. I'm using wget 1.6. If using FTP, add an option to download with same file permissions. Cheers Michiel --
Re: suggestion
Jerome Lapous [EMAIL PROTECTED] writes: One option that can be interesting is to print the donwload result on standard output instead of a file. It would avoid rights problem when the same shell is used by multiple users. Have you tried `-q -O -'?
wget suggestion...
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 One small suggestion for a possible later release... a mask for all files.. wget -m http://localhost/*.txt for example. Other than that.. all's good =) Regards.. Total K http://www.oc32.cjb.net ~ OC32 Home http://www.digiserv.cjb.net ~ Home of [Total K] http://www.digitaldisorder.cjb.net/php/download.php?sec=pgpsub_sec=f=totalk.asc ~ PGP Key - --- If mini skirts get any higher, said the Fairy to the Gnome, We'll have two more cheeks to powder, and a few more hairs to comb. -BEGIN PGP SIGNATURE- Version: PGP 6.5.3 iQA/AwUBO/ptAuVkNn/VM/QPEQKhfQCgqsmh85/7XZlWdFNYHS2tyt8g0hUAnR0H k6ekAq6xnmZQMU23vzHKTccA =QtzJ -END PGP SIGNATURE-
Re: wget suggestion
[EMAIL PROTECTED] wrote: hiya! i'd like to have wget forking into background as default (via .wgetrc) but sometimes, eg. in shell scripts, i need wget to stay in foreground, so the script knows when the file is completely downloaded (well, after wget exits =) is it possible to implement such a feature? thanks in advance, wget rocks! greets, alex you can get wget running in background by adding `' at the end, i.e. wget http://somewhere/file.txt if you don't add `' wget will run in foreground, then you still can `ctrl+z' and `bg' to send it to background or simply close the terminal in which wget is running (it will also send wget in background and even will send all messages to `wget-log' log file)... well, all this is written somewhere in the docs I'm sure :) P! Vladi. -- Vladi Belperchinov-Shabanski [EMAIL PROTECTED] [EMAIL PROTECTED] Personal home page at http://www.biscom.net/~cade DataMax Ltd. http://www.datamax.bg Too many hopes and dreams won't see the light... smime.p7s Description: S/MIME Cryptographic Signature
wget suggestion
hiya! i'd like to have wget forking into background as default (via .wgetrc) but sometimes, eg. in shell scripts, i need wget to stay in foreground, so the script knows when the file is completely downloaded (well, after wget exits =) is it possible to implement such a feature? thanks in advance, wget rocks! greets, alex
RE: suggestion
Something like that has been suggested already, but is not yet implemented (at least not in the official source, which is at 1.7 btw). For any chance, are you using a proxy ? Some (braindead imho) of those insert a string like connection interrupted at the end of a failed download. If you can get a good copy of a ruined file try comparing them in order to understand exactly where they are different, and try to match the differences with the offsets when wget had to continue a download (from a wget -v output). Heiko -- -- PREVINET S.p.A.[EMAIL PROTECTED] -- Via Ferretto, 1ph x39-041-5907073 -- I-31021 Mogliano V.to (TV) fax x39-041-5907087 -- ITALY -Original Message- From: Luis Yanes [mailto:[EMAIL PROTECTED]] Sent: Wednesday, July 11, 2001 2:24 AM To: [EMAIL PROTECTED] Subject: suggestion Dear team. First let me thank you for a so great utility. After retrieving huge iso files with wget 1.53 and finding them unusable due to checksum failure, though about a possible enhancement for wget. I think that the most probably transmission errors will ocurr just before a disconect event, giving the last few bytes the greatest chance to become corrupt, even using TCP. Althought haven't any meassurement of this affirmation. Reviewing the wget docs, haven't found anything related to this. When using the -c, --continue options for http or ftp, requesting for a few overlapping bytes could solve this potential problem. A default overlap of 256 bytes to 1K would have an insigficant impact on data throughput and may avoid a huge trashed file. Allow me suggest the following syntax: --overlap overlap (default bytes) while continuing a download --overlap=BYTES overlap BYTES while continuing a broken download --ignore-overlap If overlap segment differ use new/old. Well we are in trouble. This would need more discussion. I haven't any experience with gnu software developping, but if you don't find this interesting enough to work on it, I could try to make a patch to the lattest wget code for your review to include in the distribution. -- 73's de Luis mail: [EMAIL PROTECTED] Ampr: eb7gwl.ampr.org http://www.terra.es/personal2/melus0/ - PCBs for Homebrewed Hardware
suggestion
Dear team. First let me thank you for a so great utility. After retrieving huge iso files with wget 1.53 and finding them unusable due to checksum failure, though about a possible enhancement for wget. I think that the most probably transmission errors will ocurr just before a disconect event, giving the last few bytes the greatest chance to become corrupt, even using TCP. Althought haven't any meassurement of this affirmation. Reviewing the wget docs, haven't found anything related to this. When using the -c, --continue options for http or ftp, requesting for a few overlapping bytes could solve this potential problem. A default overlap of 256 bytes to 1K would have an insigficant impact on data throughput and may avoid a huge trashed file. Allow me suggest the following syntax: --overlap overlap (default bytes) while continuing a download --overlap=BYTES overlap BYTES while continuing a broken download --ignore-overlap If overlap segment differ use new/old. Well we are in trouble. This would need more discussion. I haven't any experience with gnu software developping, but if you don't find this interesting enough to work on it, I could try to make a patch to the lattest wget code for your review to include in the distribution. -- 73's de Luis mail: [EMAIL PROTECTED] Ampr: eb7gwl.ampr.org http://www.terra.es/personal2/melus0/ - PCBs for Homebrewed Hardware
Suggestion...
I have wget v1.5.3 -- don't know if this is current version or not but, if so, is there any possibility of a future version that translates from HTML to text files as netscape is (usually) able to do? It would be nice to be able to retrieve a text version of a web page with a script with something lacking the bloat of netscape (e.g. from a script). If there's a more recent version that already has this ability, I'll appreciate a pointer to it. Thanks. Charlie [EMAIL PROTECTED]
Re: Suggestion...
On Wed, Jul 04, 2001 at 01:42:02PM -0600, Charlie Sorsby wrote: I have wget v1.5.3 -- don't know if this is current version or not but, if so, is there any possibility of a future version that translates from HTML to text files as netscape is (usually) able to do? It would be nice to be able to retrieve a text version of a web page with a script with something lacking the bloat of netscape (e.g. from a script). It would be even nicer to avoid bloat in wget altogether by taking the revolutionary step of using an external conversion script. If there's a more recent version that already has this ability, I'll appreciate a pointer to it. Thanks. Charlie [EMAIL PROTECTED] -- Always hardwire the explosives -- Fiona Dexter quoting Monkey, J. Gregory Keyes, Dark Genesis
Suggestion
.. or better a question?!? Hi Sorry for the bad english in advance :-) I have a problem and i hope you can help me. I have tried to download some files from a ftp server by using an input file. The command I used looks like that. wget -i file the file looks like this ftp://user:[EMAIL PROTECTED]/path1/file1 ftp://user:[EMAIL PROTECTED]/path2/file2 The list is much longer. I don`t want to use the -r option, because i don`t need all the files. My problem is, that wget makes a new login for each file. But the files are all on the same server. The login on the ftp server takes a quit long time, and thats why i want to ask what i must do that wget just login in once an get all the files. I downloaded your new 1.7 version, and tried to do what i want with the --base option, but that doesn`t work, or i don`t know how. I hope you can help me. Because i would like to use wget what is doing good work in all other cases. Greets Jan Thonemann
WGET suggestion
Hello, I'm using wget and prefer it to a number of GUI-programs. It only seems to me that Style Sheets (css-files) aren't downloaded. Is this true, or am I doing something wrong? If not, I would suggest that stylesheets should also be retrieved by wget. Regards, Michael -- Michael Widowitz [EMAIL PROTECTED] http://widowitz.com - letztes Update 22.4.2001 http://astraxa.net
Re: WGET suggestion
\Quoting Michael Widowitz ([EMAIL PROTECTED]): I'm using wget and prefer it to a number of GUI-programs. It only seems to me that Style Sheets (css-files) aren't downloaded. Is this true, or am I doing something wrong? If not, I would suggest that stylesheets should also be retrieved by wget. Michael, which version of wget do you use? I guess (but maybe I'm mistaken) that versions 1.6 and upwards do download CSS when doing recursive traversal (or --page-requisities). -- jan +-- Jan Prikryl| vr|vis center for virtual reality and visualisation [EMAIL PROTECTED] | http://www.vrvis.at +--
Re: suggestion for wget
Quoting Jonathan Nichols ([EMAIL PROTECTED]): i have a suggestion for the wget program. would it be possible to have a command line option that, when invoked, would tell wget to preserve the modification date when transfering the file? i guess that `-N' (or `--timestamping') is what you're looking for. -- jan +-- Jan Prikryl| vr|vis center for virtual reality and visualisation [EMAIL PROTECTED] | http://www.vrvis.at +--
suggestion for wget
hello, i have a suggestion for the wget program. would it be possible to have a command line option that, when invoked, would tell wget to preserve the modification date when transfering the file?? the modification time would then reflect the last time the file was modified on the remote machine, as opposed to the last time it was modified on the local machine. i know that the cp command has this option (-p). is this reasonable/possible for wget?? thanks, jon
RE: SUGGESTION: rollback like GetRight
I suggest two parameter: - rollback-size - rollback-check-size where 0 = rollback-check-size = rollback-size The first for calculate the beginning of range (filesize - rollback-size) and the second for check (wget should check the range [filesize - rollback-size,filesize - rollback-size + rollback-check-size) ) freddy77 Hrvoje Niksic [EMAIL PROTECTED] writes: Daniel Stenberg [EMAIL PROTECTED] writes: Could you elaborate on this and describe in what way, theoretically, the errors would sneak into the destination file? By a silly proxy inserting a "transfer interrupted" string when the transfer between the proxy and the actual server gets interrupted. How awful. Okay, I added this to the TODO. I imagine it won't get done until someone with one of those broken proxies sends in a patch to implement it, though. --- Dan Harkless| To help prevent SPAM contamination, GNU Wget co-maintainer | please do not mention this email http://sunsite.dk/wget/ | address in Usenet posts -- thank you. Entra in www.omnitel.it. Ti aspetta un mondo di servizi on line
Re: SUGGESTION: rollback like GetRight
Quoting ZIGLIO Frediano ([EMAIL PROTECTED]): I suggest two parameter: - rollback-size - rollback-check-size where 0 = rollback-check-size = rollback-size The first for calculate the beginning of range (filesize - rollback-size) and the second for check (wget should check the range [filesize - rollback-size,filesize - rollback-size + rollback-check-size) ) My understanding of the rollback problem is that there are some broken proxies that do add some additional text garabge after the conection has timed out for example. Then, for `--rollback-size=NUM' after timing-out, wget shall cut the last NUM bytes of the file and try to resume the download. Chould you elaborate more on the situation where something like `--rollback-check-size' would be needed? What shall be checked there? -- jan +-- Jan Prikryl| vr|vis center for virtual reality and visualisation [EMAIL PROTECTED] | http://www.vrvis.at +--
RE: SUGGESTION: rollback like GetRight
Rollback is usefull mainly for checking if file is not changed. You check (compare) download data with your file. freddy77 Quoting ZIGLIO Frediano ([EMAIL PROTECTED]): I suggest two parameter: - rollback-size - rollback-check-size where 0 = rollback-check-size = rollback-size The first for calculate the beginning of range (filesize - rollback-size) and the second for check (wget should check the range [filesize - rollback-size,filesize - rollback-size + rollback-check-size) ) My understanding of the rollback problem is that there are some broken proxies that do add some additional text garabge after the conection has timed out for example. Then, for `--rollback-size=NUM' after timing-out, wget shall cut the last NUM bytes of the file and try to resume the download. Chould you elaborate more on the situation where something like `--rollback-check-size' would be needed? What shall be checked there? -- jan +- - Jan Prikryl| vr|vis center for virtual reality and visualisation [EMAIL PROTECTED] | http://www.vrvis.at +- - Entra in www.omnitel.it. Ti aspetta un mondo di servizi on line