Re: pointer to form generation / wget & cURL?

2002-05-14 Thread Daniel Stenberg

On Tue, 14 May 2002, Amy Rupp wrote:

> I'm having trouble constructing the arguments (or a file as an input) to
> wget and cURL that contains values for GET and POST methods... for example,
> if I want to feed a date into a newspaper's archive in order to retrieve an
> article, I have to pass that value to the CGI method embedded in the page.
> I've seen the cURL manpage but I haven't been able to do it yet.
>
> If anyone has a pointer to some more extensive examples of how to formulate
> the arguments to cURL and the beta wget, I'd be very interested.

Since you're asking this on the wget mailing list, and not the curl one, I
expect that you want to see something about wget doing this. I am not aware
of any such docs or explanations on how to use the (upcoming) wget POST
feature.

For general information about how to automate HTTP scripting and things to
think about, there is lots of advice here (curl-related docs):
http://curl.haxx.se/docs/httpscripting.shtml

To extract information from a HTML page with a form in, I once wrote a little
perl script that does the job: http://curl.haxx.se/programs/formfind.txt

If that is not enough, I suggest that you show us the HTML with the form you
wanna fill in, and you show us what command lines that you've tried.

Good luck!

-- 
Daniel Stenberg -- curl groks URLs -- http://curl.haxx.se/




pointer to form generation / wget & cURL?

2002-05-14 Thread Amy Rupp

I'm having trouble constructing the arguments (or a file as an input)
to wget and cURL that contains values for GET and POST methods... for
example, if I want to feed a date into a newspaper's archive in order
to retrieve an article, I have to pass that value to the CGI method
embedded in the page.  I've seen the cURL manpage but I haven't been
able to do it yet.

If anyone has a pointer to some more extensive examples of how to formulate
the arguments to cURL and the beta wget, I'd be very interested.

Thank you,

Amy



Re: wget and javascript links

2002-05-14 Thread Andre Majorel

On 2002-05-14 13:01 -0400, Kevin Murphy wrote:

> However, I am trying to suck a particular site which relies excessively 
> on javascript'ed links, e.g. via window.open, sometimes wrapped in 
> function calls.
> 
> I realize that in general this an intractable problem, but is anybody 
> aware of a partial solution?

Some people expressed interest in having a Javascript
interpreter included in Wget but AFAIK no one actually did it.

Someone pointed out that Javascript code is often simple enough
that one could write a script to parse it and extract the links.

-- 
André Majorel <[EMAIL PROTECTED]>
http://www.teaser.fr/~amajorel/



Proposal for two changes/patches

2002-05-14 Thread Evgeniy Gabrilovich

Dear WGet maintainers,

I'd like to propose two patches that I've recently
implemented on my local copy of WGet.
Here is a brief description of the changes + their rationale. 
I hope they might benefit other people too, so if you agree - 
please let me know and I'll submit a "lege artis" patch 
with source diff's + extensive descriptions.

1) Limiting the number of files recursively downloaded 
   from a given URL (in the "-r" mode).

   Currently, it's only possible to limit file types and 
   the overall download quota, but there is no control over 
   the number of files. The rationale here is to allow to download 
   only a limited number of files, that can sufficiently
   represent the site for various purposes (I myself conduct 
   research in text analysis/information retrieval, so it's useful
   to have a representative set of a site's pages).

   To this end I suggest to add a comman-line/config-file parameter,
   which will then control the behaviour of function 'retrieve_tree',
   by returning when the desired number of files has been downloaded.

2) Optionally deleting downloaded files that are not of type "text/html".

   Even though there is a mechanism for limiting downloaded file types,
   it's hard to configure Wget to download only files of type "text/html".
   Setting the Accept list to "htm,html" won't help for exactly the same
   reason that there exists option "-E", because not all "text/html" files
   have this extension (e.g., "asp" files or files dynamically created by
   CGI scripts, to name but a few).
   Again, for various site analyses it is convenient to only download files
   of type "text/html". So if it's impossible to selectively download only
   such files, we can download all files, and then delete all those of 
   other types.

Regards,

Evgeniy.

--
Evgeniy Gabrilovich
Ph.D. student in Computer Science
Department of Computer Science, Technion - Israel Institute of Technology
Technion City, Haifa 32000, Israel
E-mail: [EMAIL PROTECTED] WWW: http://www.cs.technion.ac.il/~gabr
Phone: (office) +972-4-8294948




wget and javascript links

2002-05-14 Thread Kevin Murphy

Wgetters,

I love wget and use it all the time.

However, I am trying to suck a particular site which relies excessively 
on javascript'ed links, e.g. via window.open, sometimes wrapped in 
function calls.

I realize that in general this an intractable problem, but is anybody 
aware of a partial solution?

Is there something similar to wget that would allow me to write my own 
hooks to get at the javascript-wrapped links?

Thanks,
Kevin




ScanMail Message: To Recipient virus found or matched file blocking setting.

2002-05-14 Thread System Attendant

ScanMail for Microsoft Exchange has taken action on the message, please
refer to the contents of this message for further details.

Sender = [EMAIL PROTECTED]
Recipient(s) = [EMAIL PROTECTED];
Subject = Local law may apply.
Scanning Time = 05/14/2002 16:51:22
Engine/Pattern = 6.150-1001/281

Action on message:
The attachment may.bat matched file blocking settings. ScanMail has taken
the Deleted action. 

In einer für Sie bestimmten Nachricht wurde ein als gefährlich eingestufter
Anhang geblockt oder es wurde ein Virus gefunden. Der Absender der Nachricht
wird ebenfalls automatisch informiert. Als gefährlich eingestuft gelten u.A.
alle ausführbaren Dateien wie z.B. *.exe, *.bat, *.com, *.cmd, *.pif, *.scr.
Wenn sie eine Datei mit entsprechender Endung verschicken oder empfangen
wollen, komprimieren sie diese bitte zu einer *.zip-Datei mit Winzip.
An attachment has been blocked which is classified as dangerous or a Virus
has been found in the mail received by you. The sender of this mail was
automatically informed. Among the attachments classified as dangerous are
all executable files like *.exe, *.bat, *.com, *.cmd, *.pif, *.scr. If you
need to send or receive such an attachment you should compress it first into
a *.zip archive by using Winzip.



wget@sunsite.dk

2002-05-14 Thread Dale






wget@sunsite.dk

2002-05-14 Thread Dale Therio






problem with "&" query using wget

2002-05-14 Thread Kitty SL Kwan

Dear webmaster of wget,

I've tried using GNU Wget 1.5.3 & 1.6 and the problem persists.  Is there
any problem with this version?  The query with the "&" operator just got
decoded and can't parsed by the search engine.  That is,
"Anniversary+%26%celebration" got decoded to "Anniversary+&+celebration".
Someone has tried version 1.8.1 and no problem was found.  Is it related to
the version I used or pls advise if there should be other problem, thx.


# wget "http://search.info.gov.hk/cgi-bin/se.cgi?mode=16&la=1&gr_1=+ma=100&ft_1=
alltype&so=1&nu=10&ca=0&ta=all&fu=&fd=&gr_test=+&qu=Anniversary+%26+celebration"
--14:07:34--  http://search.info.gov.hk:80/cgi-bin/se.cgi?mode=16&la=1&gr_1=+ma=
100&ft_1=alltype&so=1&nu=10&ca=0&ta=all&fu=&fd=&gr_test=+&qu=Anniversary+&+celeb
ration
   => `se.cgi?mode=16&la=1&gr_1=+ma=100&ft_1=alltype&so=1&nu=10&ca=0&ta=
all&fu=&fd=&gr_test=+&qu=Anniversary+&+celebration'
Connecting to search.info.gov.hk:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

0K -> .. .

14:07:35 (97.46 KB/s) - `se.cgi?mode=16&la=1&gr_1=+ma=100&ft_1=alltype&so=1&nu=1
0&ca=0&ta=all&fu=&fd=&gr_test=+&qu=Anniversary+&+celebration' saved [19959]


user in: 14/05/2002 14:07:51
Formated Query: anniversary
Generate Info: anniversary
Done: anniversary
user out: 14/05/2002 14:07:51


Thanks a lots!

Regards,
Kitty,
GIC Support Team,
ITSD.