--spider parameter
hi, I use Wget to check page state with the --spider parameter I looking for a way to get back only the number server response (200 if OK, 404 if missing, ...) but I don't found a simple way. So i try to write the result to a file and parse it but there is no standard output for each response can you add a parameter to get only the number or standardize the reponse (no return carriage and number for an autorization failed) thanx for your work ;)
not downloading at all, help
Hello. What goes wrong in the following? (I will read replies from the list archives.) % wget http://www.maqamworld.com/ --16:59:21-- http://www.maqamworld.com:80/ = `index.html' Connecting to www.maqamworld.com:80... connected! HTTP request sent, awaiting response... 503 Unknown site 16:59:21 ERROR 503: Unknown site. Regards, Juhana
Re: --spider parameter
Some sort of URL reporting facility is on the unspoken TODO list. http://www.mail-archive.com/[EMAIL PROTECTED]/msg05282.html /a On Wed, 11 Feb 2004, Olivier SOW wrote: hi, I use Wget to check page state with the --spider parameter I looking for a way to get back only the number server response (200 if OK, 404 if missing, ...) but I don't found a simple way. So i try to write the result to a file and parse it but there is no standard output for each response can you add a parameter to get only the number or standardize the reponse (no return carriage and number for an autorization failed) thanx for your work ;)
Regex matching of url
Hello, Here is two new options to accept or reject an url with a regular expression. --regex-accept --regex-reject I have included #ifdef conditionnal in order to make it optionnal, I plan to use a autoconf macro to detect whether the libc regex is usable. Do you find this patch usefull ? Nicolas. ChangeLog: * configure.in: Check for regex feature. doc/ChangeLog: * wget.info (Recursive Accept/Reject Options): Document `--regex-accept' and `--regex-reject'. (Url-Based Limits): Ditto. (Wgetrc Commands): Ditto. src/ChangeLog: * init.c: New options `--regex-accept' and `--regex-reject'. * main.c: Ditto. * options.h: Ditto. * recur.c (download_child_p): Take opt.regex_accept and opt.regex_reject into account. * utils.c (regex_match): New function. (regex_accurl): Ditto. (free_regex_vec): Ditto. (append_regex_vec): Ditto. Index: configure.in === RCS file: /pack/anoncvs/wget/configure.in,v retrieving revision 1.73 diff -u -r1.73 configure.in --- configure.in2003/11/26 22:46:13 1.73 +++ configure.in2004/02/08 22:42:03 @@ -74,6 +74,13 @@ test x${ENABLE_DEBUG} = xyes AC_DEFINE([ENABLE_DEBUG], 1, [Define if you want the debug output support compiled in.]) +AC_ARG_ENABLE(regex, +[ --disable-regex disable support for regular expression url + matching], +ENABLE_REGEX=$enableval, ENABLE_REGEX=yes) +test x${ENABLE_REGEX} = xyes AC_DEFINE([ENABLE_REGEX], 1, + [Define if you want the regex support compiled in.]) + wget_need_md5=no case ${USE_OPIE}${USE_DIGEST} in Index: doc/wget.texi === RCS file: /pack/anoncvs/wget/doc/wget.texi,v retrieving revision 1.97 diff -u -r1.97 wget.texi --- doc/wget.texi 2004/02/08 10:50:13 1.97 +++ doc/wget.texi 2004/02/08 22:42:08 @@ -1575,6 +1575,13 @@ download (@pxref{Directory-Based Limits} for more details.) Elements of @var{list} may contain wildcards. [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] +Specify a regular expression used to accept or reject urls. Each use of these +options add a regular expression to the corresponding list. To be accepted, +an url must match any expression of the accept list and none of the reject +list. + @item -np @item --no-parent Do not ever ascend to the parent directory when retrieving recursively. @@ -1672,6 +1679,7 @@ * Spanning Hosts:: (Un)limiting retrieval based on host name. * Types of Files:: Getting only certain files. * Directory-Based Limits:: Getting only certain directories. +* Url-Based Limits:: Getting only certain urls. * Relative Links:: Follow relative links only. * FTP Links:: Following FTP links. @end menu @@ -1873,6 +1881,37 @@ intelligent fashion. @end table [EMAIL PROTECTED] Url-Based Limits [EMAIL PROTECTED] Url-Based Limits [EMAIL PROTECTED] url-based limits + +Some website require clever rules to decide if a file must be downloaded or +not. For example, when every information is included in the request part of an +url. In such cases, directory or file type limits are not powerfull enough. + +Wget offers two options to deal with this problem. Each option +description lists a long name and the equivalent command in @file{.wgetrc}. + [EMAIL PROTECTED] accept urls [EMAIL PROTECTED] urls, accept [EMAIL PROTECTED] @samp [EMAIL PROTECTED] --regex-accept @var{regex} [EMAIL PROTECTED] regex_accept = @var{regex} +The argument to @samp{--regex-accept} is a regular expression, like ones used +by grep. This expression is added to a list of acceptable url patterns. To be +accepted, an url must match any pattern in the list. + + + [EMAIL PROTECTED] reject urls [EMAIL PROTECTED] urls, reject [EMAIL PROTECTED] --regex-reject @var{regex} [EMAIL PROTECTED] regex_reject = @var{regex} +The @samp{--regex-reject} option works the same way as @samp{--regex-accept}, only +its logic is the reverse; Wget will download all urls @emph{except} the +ones matching any pattern in the list. [EMAIL PROTECTED] table + @node Relative Links @section Relative Links @cindex relative links @@ -2416,6 +2455,10 @@ Set HTTP @samp{Referer:} header just like @samp{--referer}. (Note it was the folks who wrote the @sc{http} spec who got the spelling of ``referrer'' wrong.) + [EMAIL PROTECTED] regex_accept/regex_reject = @var{string} +Same as @samp{--regex-accept}/@samp{--regex-reject} (@pxref{Url-Based +Limits}). @item quiet = on/off Quiet mode---the same as @samp{-q}. Index: src/init.c === RCS file: /pack/anoncvs/wget/src/init.c,v retrieving revision 1.91 diff -u -r1.91 init.c --- src/init.c 2003/12/14 13:35:27 1.91 +++ src/init.c 2004/02/08 22:42:09 @@ -85,6 +85,9 @@
Re: Regex matching of url
I dont know how I got bc'd on this continuing dialog, but I would appreciate the removal of my address. Thanks/ B Joseph P. Bachant, Policy Coordinator MO Dept. of Conservation PO Box 180, Jefferson City, MO 65102-0180 Office: 573/ 751-4115 x 3596 Fax: 573/ 526-4495 e-mail: [EMAIL PROTECTED] Nicolas Schodet [EMAIL PROTECTED] 02/11/04 10:31AM Hello, Here is two new options to accept or reject an url with a regular expression. --regex-accept --regex-reject I have included #ifdef conditionnal in order to make it optionnal, I plan to use a autoconf macro to detect whether the libc regex is usable. Do you find this patch usefull ? Nicolas. ChangeLog: * configure.in: Check for regex feature. doc/ChangeLog: * wget.info (Recursive Accept/Reject Options): Document `--regex-accept' and `--regex-reject'. (Url-Based Limits): Ditto. (Wgetrc Commands): Ditto. src/ChangeLog: * init.c: New options `--regex-accept' and `--regex-reject'. * main.c: Ditto. * options.h: Ditto. * recur.c (download_child_p): Take opt.regex_accept and opt.regex_reject into account. * utils.c (regex_match): New function. (regex_accurl): Ditto. (free_regex_vec): Ditto. (append_regex_vec): Ditto. Index: configure.in === RCS file: /pack/anoncvs/wget/configure.in,v retrieving revision 1.73 diff -u -r1.73 configure.in --- configure.in2003/11/26 22:46:13 1.73 +++ configure.in2004/02/08 22:42:03 @@ -74,6 +74,13 @@ test x${ENABLE_DEBUG} = xyes AC_DEFINE([ENABLE_DEBUG], 1, [Define if you want the debug output support compiled in.]) +AC_ARG_ENABLE(regex, +[ --disable-regex disable support for regular expression url + matching], +ENABLE_REGEX=$enableval, ENABLE_REGEX=yes) +test x${ENABLE_REGEX} = xyes AC_DEFINE([ENABLE_REGEX], 1, + [Define if you want the regex support compiled in.]) + wget_need_md5=no case ${USE_OPIE}${USE_DIGEST} in Index: doc/wget.texi === RCS file: /pack/anoncvs/wget/doc/wget.texi,v retrieving revision 1.97 diff -u -r1.97 wget.texi --- doc/wget.texi 2004/02/08 10:50:13 1.97 +++ doc/wget.texi 2004/02/08 22:42:08 @@ -1575,6 +1575,13 @@ download (@pxref{Directory-Based Limits} for more details.) Elements of @var{list} may contain wildcards. [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] +Specify a regular expression used to accept or reject urls. Each use of these +options add a regular expression to the corresponding list. To be accepted, +an url must match any expression of the accept list and none of the reject +list. + @item -np @item --no-parent Do not ever ascend to the parent directory when retrieving recursively. @@ -1672,6 +1679,7 @@ * Spanning Hosts:: (Un)limiting retrieval based on host name. * Types of Files:: Getting only certain files. * Directory-Based Limits:: Getting only certain directories. +* Url-Based Limits:: Getting only certain urls. * Relative Links:: Follow relative links only. * FTP Links:: Following FTP links. @end menu @@ -1873,6 +1881,37 @@ intelligent fashion. @end table [EMAIL PROTECTED] Url-Based Limits [EMAIL PROTECTED] Url-Based Limits [EMAIL PROTECTED] url-based limits + +Some website require clever rules to decide if a file must be downloaded or +not. For example, when every information is included in the request part of an +url. In such cases, directory or file type limits are not powerfull enough. + +Wget offers two options to deal with this problem. Each option +description lists a long name and the equivalent command in @file{.wgetrc}. + [EMAIL PROTECTED] accept urls [EMAIL PROTECTED] urls, accept [EMAIL PROTECTED] @samp [EMAIL PROTECTED] --regex-accept @var{regex} [EMAIL PROTECTED] regex_accept = @var{regex} +The argument to @samp{--regex-accept} is a regular expression, like ones used +by grep. This expression is added to a list of acceptable url patterns. To be +accepted, an url must match any pattern in the list. + + + [EMAIL PROTECTED] reject urls [EMAIL PROTECTED] urls, reject [EMAIL PROTECTED] --regex-reject @var{regex} [EMAIL PROTECTED] regex_reject = @var{regex} +The @samp{--regex-reject} option works the same way as @samp{--regex-accept}, only +its logic is the reverse; Wget will download all urls @emph{except} the +ones matching any pattern in the list. [EMAIL PROTECTED] table + @node Relative Links @section Relative Links @cindex relative links @@ -2416,6 +2455,10 @@ Set HTTP @samp{Referer:} header just like @samp{--referer}. (Note it was the folks who wrote the @sc{http} spec who got the spelling of ``referrer'' wrong.) + [EMAIL PROTECTED] regex_accept/regex_reject = @var{string} +Same as @samp{--regex-accept}/@samp{--regex-reject}
RE: hi
Hi, this is Terence's spam blocker. Apparently this is the first time he's getting email from this reply-to email address. Just follow the link and answer the simple question to verify you are a human not a spam-bot and I'll get the message. [was getting 1000 spam a day; now, zippo!]. Thanks, Terence http://knowspam.net/v/[EMAIL PROTECTED][EMAIL PROTECTED] Thanks!