Re: pointer to form generation / wget & cURL?
On Tue, 14 May 2002, Amy Rupp wrote: > I'm having trouble constructing the arguments (or a file as an input) to > wget and cURL that contains values for GET and POST methods... for example, > if I want to feed a date into a newspaper's archive in order to retrieve an > article, I have to pass that value to the CGI method embedded in the page. > I've seen the cURL manpage but I haven't been able to do it yet. > > If anyone has a pointer to some more extensive examples of how to formulate > the arguments to cURL and the beta wget, I'd be very interested. Since you're asking this on the wget mailing list, and not the curl one, I expect that you want to see something about wget doing this. I am not aware of any such docs or explanations on how to use the (upcoming) wget POST feature. For general information about how to automate HTTP scripting and things to think about, there is lots of advice here (curl-related docs): http://curl.haxx.se/docs/httpscripting.shtml To extract information from a HTML page with a form in, I once wrote a little perl script that does the job: http://curl.haxx.se/programs/formfind.txt If that is not enough, I suggest that you show us the HTML with the form you wanna fill in, and you show us what command lines that you've tried. Good luck! -- Daniel Stenberg -- curl groks URLs -- http://curl.haxx.se/
pointer to form generation / wget & cURL?
I'm having trouble constructing the arguments (or a file as an input) to wget and cURL that contains values for GET and POST methods... for example, if I want to feed a date into a newspaper's archive in order to retrieve an article, I have to pass that value to the CGI method embedded in the page. I've seen the cURL manpage but I haven't been able to do it yet. If anyone has a pointer to some more extensive examples of how to formulate the arguments to cURL and the beta wget, I'd be very interested. Thank you, Amy
Re: wget and javascript links
On 2002-05-14 13:01 -0400, Kevin Murphy wrote: > However, I am trying to suck a particular site which relies excessively > on javascript'ed links, e.g. via window.open, sometimes wrapped in > function calls. > > I realize that in general this an intractable problem, but is anybody > aware of a partial solution? Some people expressed interest in having a Javascript interpreter included in Wget but AFAIK no one actually did it. Someone pointed out that Javascript code is often simple enough that one could write a script to parse it and extract the links. -- André Majorel <[EMAIL PROTECTED]> http://www.teaser.fr/~amajorel/
Proposal for two changes/patches
Dear WGet maintainers, I'd like to propose two patches that I've recently implemented on my local copy of WGet. Here is a brief description of the changes + their rationale. I hope they might benefit other people too, so if you agree - please let me know and I'll submit a "lege artis" patch with source diff's + extensive descriptions. 1) Limiting the number of files recursively downloaded from a given URL (in the "-r" mode). Currently, it's only possible to limit file types and the overall download quota, but there is no control over the number of files. The rationale here is to allow to download only a limited number of files, that can sufficiently represent the site for various purposes (I myself conduct research in text analysis/information retrieval, so it's useful to have a representative set of a site's pages). To this end I suggest to add a comman-line/config-file parameter, which will then control the behaviour of function 'retrieve_tree', by returning when the desired number of files has been downloaded. 2) Optionally deleting downloaded files that are not of type "text/html". Even though there is a mechanism for limiting downloaded file types, it's hard to configure Wget to download only files of type "text/html". Setting the Accept list to "htm,html" won't help for exactly the same reason that there exists option "-E", because not all "text/html" files have this extension (e.g., "asp" files or files dynamically created by CGI scripts, to name but a few). Again, for various site analyses it is convenient to only download files of type "text/html". So if it's impossible to selectively download only such files, we can download all files, and then delete all those of other types. Regards, Evgeniy. -- Evgeniy Gabrilovich Ph.D. student in Computer Science Department of Computer Science, Technion - Israel Institute of Technology Technion City, Haifa 32000, Israel E-mail: [EMAIL PROTECTED] WWW: http://www.cs.technion.ac.il/~gabr Phone: (office) +972-4-8294948
wget and javascript links
Wgetters, I love wget and use it all the time. However, I am trying to suck a particular site which relies excessively on javascript'ed links, e.g. via window.open, sometimes wrapped in function calls. I realize that in general this an intractable problem, but is anybody aware of a partial solution? Is there something similar to wget that would allow me to write my own hooks to get at the javascript-wrapped links? Thanks, Kevin
ScanMail Message: To Recipient virus found or matched file blocking setting.
ScanMail for Microsoft Exchange has taken action on the message, please refer to the contents of this message for further details. Sender = [EMAIL PROTECTED] Recipient(s) = [EMAIL PROTECTED]; Subject = Local law may apply. Scanning Time = 05/14/2002 16:51:22 Engine/Pattern = 6.150-1001/281 Action on message: The attachment may.bat matched file blocking settings. ScanMail has taken the Deleted action. In einer für Sie bestimmten Nachricht wurde ein als gefährlich eingestufter Anhang geblockt oder es wurde ein Virus gefunden. Der Absender der Nachricht wird ebenfalls automatisch informiert. Als gefährlich eingestuft gelten u.A. alle ausführbaren Dateien wie z.B. *.exe, *.bat, *.com, *.cmd, *.pif, *.scr. Wenn sie eine Datei mit entsprechender Endung verschicken oder empfangen wollen, komprimieren sie diese bitte zu einer *.zip-Datei mit Winzip. An attachment has been blocked which is classified as dangerous or a Virus has been found in the mail received by you. The sender of this mail was automatically informed. Among the attachments classified as dangerous are all executable files like *.exe, *.bat, *.com, *.cmd, *.pif, *.scr. If you need to send or receive such an attachment you should compress it first into a *.zip archive by using Winzip.
wget@sunsite.dk
wget@sunsite.dk
problem with "&" query using wget
Dear webmaster of wget, I've tried using GNU Wget 1.5.3 & 1.6 and the problem persists. Is there any problem with this version? The query with the "&" operator just got decoded and can't parsed by the search engine. That is, "Anniversary+%26%celebration" got decoded to "Anniversary+&+celebration". Someone has tried version 1.8.1 and no problem was found. Is it related to the version I used or pls advise if there should be other problem, thx. # wget "http://search.info.gov.hk/cgi-bin/se.cgi?mode=16&la=1&gr_1=+ma=100&ft_1= alltype&so=1&nu=10&ca=0&ta=all&fu=&fd=&gr_test=+&qu=Anniversary+%26+celebration" --14:07:34-- http://search.info.gov.hk:80/cgi-bin/se.cgi?mode=16&la=1&gr_1=+ma= 100&ft_1=alltype&so=1&nu=10&ca=0&ta=all&fu=&fd=&gr_test=+&qu=Anniversary+&+celeb ration => `se.cgi?mode=16&la=1&gr_1=+ma=100&ft_1=alltype&so=1&nu=10&ca=0&ta= all&fu=&fd=&gr_test=+&qu=Anniversary+&+celebration' Connecting to search.info.gov.hk:80... connected! HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] 0K -> .. . 14:07:35 (97.46 KB/s) - `se.cgi?mode=16&la=1&gr_1=+ma=100&ft_1=alltype&so=1&nu=1 0&ca=0&ta=all&fu=&fd=&gr_test=+&qu=Anniversary+&+celebration' saved [19959] user in: 14/05/2002 14:07:51 Formated Query: anniversary Generate Info: anniversary Done: anniversary user out: 14/05/2002 14:07:51 Thanks a lots! Regards, Kitty, GIC Support Team, ITSD.