date:20040211

--spider parameter

2004-02-11 Thread Olivier SOW

hi,

I use Wget to check page state with the --spider parameter
I looking for a way to get back only the number server response (200 if OK,
404 if missing, ...)
but I don't found a simple way.
So i try to write the result to a file and parse it but there is no standard
output for each response

can you add a parameter to get only the number or standardize the reponse
(no return carriage and number for an autorization failed)

thanx for your work ;)

not downloading at all, help

2004-02-11 Thread Juhana Sadeharju

Hello.

What goes wrong in the following? (I will read replies from the list
archives.)

  % wget http://www.maqamworld.com/

  --16:59:21--  http://www.maqamworld.com:80/
 = `index.html'
  Connecting to www.maqamworld.com:80... connected!
  HTTP request sent, awaiting response... 503 Unknown site
  16:59:21 ERROR 503: Unknown site.

Regards,
Juhana

Re: --spider parameter

2004-02-11 Thread Aaron S. Hawley

Some sort of URL reporting facility is on the unspoken TODO list.
http://www.mail-archive.com/[EMAIL PROTECTED]/msg05282.html

/a

On Wed, 11 Feb 2004, Olivier SOW wrote:

 hi,

 I use Wget to check page state with the --spider parameter
 I looking for a way to get back only the number server response (200 if OK,
 404 if missing, ...)
 but I don't found a simple way.
 So i try to write the result to a file and parse it but there is no standard
 output for each response

 can you add a parameter to get only the number or standardize the reponse
 (no return carriage and number for an autorization failed)

 thanx for your work ;)

Regex matching of url

2004-02-11 Thread Nicolas Schodet

Hello,

Here is two new options to accept or reject an url with a regular
expression.

--regex-accept
--regex-reject

I have included #ifdef conditionnal in order to make it optionnal, I
plan to use a autoconf macro to detect whether the libc regex is usable.

Do you find this patch usefull ?

Nicolas.



ChangeLog:

* configure.in: Check for regex feature.

doc/ChangeLog:

* wget.info (Recursive Accept/Reject Options): Document
`--regex-accept' and `--regex-reject'.
(Url-Based Limits): Ditto.
(Wgetrc Commands): Ditto.

src/ChangeLog:

* init.c: New options `--regex-accept' and `--regex-reject'.

* main.c: Ditto.

* options.h: Ditto.

* recur.c (download_child_p): Take opt.regex_accept and
opt.regex_reject into account.

* utils.c (regex_match): New function.
(regex_accurl): Ditto.
(free_regex_vec): Ditto.
(append_regex_vec): Ditto.


Index: configure.in
===
RCS file: /pack/anoncvs/wget/configure.in,v
retrieving revision 1.73
diff -u -r1.73 configure.in
--- configure.in2003/11/26 22:46:13 1.73
+++ configure.in2004/02/08 22:42:03
@@ -74,6 +74,13 @@
 test x${ENABLE_DEBUG} = xyes  AC_DEFINE([ENABLE_DEBUG], 1,
[Define if you want the debug output support compiled in.])
 
+AC_ARG_ENABLE(regex,
+[  --disable-regex disable support for regular expression url
+  matching],
+ENABLE_REGEX=$enableval, ENABLE_REGEX=yes)
+test x${ENABLE_REGEX} = xyes  AC_DEFINE([ENABLE_REGEX], 1,
+   [Define if you want the regex support compiled in.])
+
 wget_need_md5=no
 
 case ${USE_OPIE}${USE_DIGEST} in
Index: doc/wget.texi
===
RCS file: /pack/anoncvs/wget/doc/wget.texi,v
retrieving revision 1.97
diff -u -r1.97 wget.texi
--- doc/wget.texi   2004/02/08 10:50:13 1.97
+++ doc/wget.texi   2004/02/08 22:42:08
@@ -1575,6 +1575,13 @@
 download (@pxref{Directory-Based Limits} for more details.)  Elements of
 @var{list} may contain wildcards.
 
[EMAIL PROTECTED] [EMAIL PROTECTED]
[EMAIL PROTECTED] [EMAIL PROTECTED]
+Specify a regular expression used to accept or reject urls.  Each use of these
+options add a regular expression to the corresponding list.  To be accepted,
+an url must match any expression of the accept list and none of the reject
+list.
+
 @item -np
 @item --no-parent
 Do not ever ascend to the parent directory when retrieving recursively.
@@ -1672,6 +1679,7 @@
 * Spanning Hosts:: (Un)limiting retrieval based on host name.
 * Types of Files:: Getting only certain files.
 * Directory-Based Limits:: Getting only certain directories.
+* Url-Based Limits::   Getting only certain urls.
 * Relative Links:: Follow relative links only.
 * FTP Links::  Following FTP links.
 @end menu
@@ -1873,6 +1881,37 @@
 intelligent fashion.
 @end table
 
[EMAIL PROTECTED] Url-Based Limits
[EMAIL PROTECTED] Url-Based Limits
[EMAIL PROTECTED] url-based limits
+
+Some website require clever rules to decide if a file must be downloaded or
+not. For example, when every information is included in the request part of an
+url. In such cases, directory or file type limits are not powerfull enough.
+
+Wget offers two options to deal with this problem.  Each option
+description lists a long name and the equivalent command in @file{.wgetrc}.
+
[EMAIL PROTECTED] accept urls
[EMAIL PROTECTED] urls, accept
[EMAIL PROTECTED] @samp
[EMAIL PROTECTED] --regex-accept @var{regex}
[EMAIL PROTECTED] regex_accept = @var{regex}
+The argument to @samp{--regex-accept} is a regular expression, like ones used
+by grep. This expression is added to a list of acceptable url patterns. To be
+accepted, an url must match any pattern in the list.
+
+
+
[EMAIL PROTECTED] reject urls
[EMAIL PROTECTED] urls, reject
[EMAIL PROTECTED] --regex-reject @var{regex}
[EMAIL PROTECTED] regex_reject = @var{regex}
+The @samp{--regex-reject} option works the same way as @samp{--regex-accept}, only
+its logic is the reverse; Wget will download all urls @emph{except} the
+ones matching any pattern in the list.
[EMAIL PROTECTED] table
+
 @node Relative Links
 @section Relative Links
 @cindex relative links
@@ -2416,6 +2455,10 @@
 Set HTTP @samp{Referer:} header just like @samp{--referer}.  (Note it
 was the folks who wrote the @sc{http} spec who got the spelling of
 ``referrer'' wrong.)
+
[EMAIL PROTECTED] regex_accept/regex_reject = @var{string}
+Same as @samp{--regex-accept}/@samp{--regex-reject} (@pxref{Url-Based
+Limits}).
 
 @item quiet = on/off
 Quiet mode---the same as @samp{-q}.
Index: src/init.c
===
RCS file: /pack/anoncvs/wget/src/init.c,v
retrieving revision 1.91
diff -u -r1.91 init.c
--- src/init.c  2003/12/14 13:35:27 1.91
+++ src/init.c  2004/02/08 22:42:09
@@ -85,6 +85,9 @@

Re: Regex matching of url

2004-02-11 Thread Joseph Bachant

I dont know how I got bc'd on this continuing dialog, but I would
appreciate the removal of my address.  Thanks/ B

Joseph P. Bachant, Policy Coordinator
MO Dept. of Conservation
PO Box 180, Jefferson City, MO 65102-0180
Office: 573/ 751-4115 x 3596
Fax: 573/ 526-4495
e-mail: [EMAIL PROTECTED]

 Nicolas Schodet [EMAIL PROTECTED] 02/11/04 10:31AM 
Hello,

Here is two new options to accept or reject an url with a regular
expression.

--regex-accept
--regex-reject

I have included #ifdef conditionnal in order to make it optionnal, I
plan to use a autoconf macro to detect whether the libc regex is
usable.

Do you find this patch usefull ?

Nicolas.



ChangeLog:

* configure.in: Check for regex feature.

doc/ChangeLog:

* wget.info (Recursive Accept/Reject Options): Document
`--regex-accept' and `--regex-reject'.
(Url-Based Limits): Ditto.
(Wgetrc Commands): Ditto.

src/ChangeLog:

* init.c: New options `--regex-accept' and `--regex-reject'.

* main.c: Ditto.

* options.h: Ditto.

* recur.c (download_child_p): Take opt.regex_accept and
opt.regex_reject into account.

* utils.c (regex_match): New function.
(regex_accurl): Ditto.
(free_regex_vec): Ditto.
(append_regex_vec): Ditto.


Index: configure.in
===
RCS file: /pack/anoncvs/wget/configure.in,v
retrieving revision 1.73
diff -u -r1.73 configure.in
--- configure.in2003/11/26 22:46:13 1.73
+++ configure.in2004/02/08 22:42:03
@@ -74,6 +74,13 @@
 test x${ENABLE_DEBUG} = xyes  AC_DEFINE([ENABLE_DEBUG], 1,
[Define if you want the debug output support compiled in.])
 
+AC_ARG_ENABLE(regex,
+[  --disable-regex disable support for regular expression url
+  matching],
+ENABLE_REGEX=$enableval, ENABLE_REGEX=yes)
+test x${ENABLE_REGEX} = xyes  AC_DEFINE([ENABLE_REGEX], 1,
+   [Define if you want the regex support compiled in.])
+
 wget_need_md5=no
 
 case ${USE_OPIE}${USE_DIGEST} in
Index: doc/wget.texi
===
RCS file: /pack/anoncvs/wget/doc/wget.texi,v
retrieving revision 1.97
diff -u -r1.97 wget.texi
--- doc/wget.texi   2004/02/08 10:50:13 1.97
+++ doc/wget.texi   2004/02/08 22:42:08
@@ -1575,6 +1575,13 @@
 download (@pxref{Directory-Based Limits} for more details.)  Elements
of
 @var{list} may contain wildcards.
 
[EMAIL PROTECTED] [EMAIL PROTECTED]
[EMAIL PROTECTED] [EMAIL PROTECTED]
+Specify a regular expression used to accept or reject urls.  Each use
of these
+options add a regular expression to the corresponding list.  To be
accepted,
+an url must match any expression of the accept list and none of the
reject
+list.
+
 @item -np
 @item --no-parent
 Do not ever ascend to the parent directory when retrieving
recursively.
@@ -1672,6 +1679,7 @@
 * Spanning Hosts:: (Un)limiting retrieval based on host name.
 * Types of Files:: Getting only certain files.
 * Directory-Based Limits:: Getting only certain directories.
+* Url-Based Limits::   Getting only certain urls.
 * Relative Links:: Follow relative links only.
 * FTP Links::  Following FTP links.
 @end menu
@@ -1873,6 +1881,37 @@
 intelligent fashion.
 @end table
 
[EMAIL PROTECTED] Url-Based Limits
[EMAIL PROTECTED] Url-Based Limits
[EMAIL PROTECTED] url-based limits
+
+Some website require clever rules to decide if a file must be
downloaded or
+not. For example, when every information is included in the request
part of an
+url. In such cases, directory or file type limits are not powerfull
enough.
+
+Wget offers two options to deal with this problem.  Each option
+description lists a long name and the equivalent command in
@file{.wgetrc}.
+
[EMAIL PROTECTED] accept urls
[EMAIL PROTECTED] urls, accept
[EMAIL PROTECTED] @samp
[EMAIL PROTECTED] --regex-accept @var{regex}
[EMAIL PROTECTED] regex_accept = @var{regex}
+The argument to @samp{--regex-accept} is a regular expression, like
ones used
+by grep. This expression is added to a list of acceptable url
patterns. To be
+accepted, an url must match any pattern in the list.
+
+
+
[EMAIL PROTECTED] reject urls
[EMAIL PROTECTED] urls, reject
[EMAIL PROTECTED] --regex-reject @var{regex}
[EMAIL PROTECTED] regex_reject = @var{regex}
+The @samp{--regex-reject} option works the same way as
@samp{--regex-accept}, only
+its logic is the reverse; Wget will download all urls @emph{except}
the
+ones matching any pattern in the list.
[EMAIL PROTECTED] table
+
 @node Relative Links
 @section Relative Links
 @cindex relative links
@@ -2416,6 +2455,10 @@
 Set HTTP @samp{Referer:} header just like @samp{--referer}.  (Note it
 was the folks who wrote the @sc{http} spec who got the spelling of
 ``referrer'' wrong.)
+
[EMAIL PROTECTED] regex_accept/regex_reject = @var{string}
+Same as @samp{--regex-accept}/@samp{--regex-reject}

RE: hi

2004-02-11 Thread parrt

Hi, this is Terence's spam blocker.  Apparently this is the first time he's getting 
email from this reply-to email address.  Just follow the link and answer the simple 
question to verify you are a human not a spam-bot and I'll get the message.  [was 
getting 1000 spam a day; now, zippo!].

Thanks,
Terence



http://knowspam.net/v/[EMAIL PROTECTED][EMAIL PROTECTED]


Thanks!

--spider parameter

not downloading at all, help

Re: --spider parameter

Regex matching of url

Re: Regex matching of url

RE: hi

6 matches

Site Navigation

Mail list logo

Footer information