Re: UNIX - Installing Crypt::SSLeay
On Wed, Feb 13, 2008 at 12:46:15PM -0500, David Moreno wrote: I think that, from the paths he pasted, it's a Sun4 Solaris. In that case, the answer is probably to install a C compiler. Last time I used Solaris, it didn't come with a C compiler, but Sun offered CDs with optional additional packages that include gcc. (They may since have modernized the distribution of this software.) Install that and either you will have a cc, or you will have a gcc and need to tell CPAN that gcc is your C compiler. -- Reinier
Re: About LWP in general??
On Fri, Aug 18, 2006 at 10:48:30AM +0100, chris choi wrote: Hi I'm new to Perl, but considering writing web robots/Spiders in Perl, but I'm not too sure if the LWP is out-dated or something cause I haven't heard anything about LWP recently, so I was wondering if you guys know if there is a new thing to write WEB robots with on PERL?? No, the LWP is not at all outdated, and very much the standard Web client library for Perl, as far as I'm aware. There is little development activity goes on, but in this case, as far as I'm aware, this is not a sign of abandonment, but of maturity: the library is finished and does what it was designed to do. Gisle Aas, its maintainer, monitors this list for feature requests and bug reports, and occasionally creates a new LWP version in response. Yes, there are newer libraries specifically for writing Web robots in Perl; some of them are in CPAN, and are frequently mentioned here; WWW::Mechanize is the name I see most often. These libraries all use LWP, as far as I'm aware, and you are definitely advised to use them instead of building your own directly on top of LWP. However, I do not use any of them at the moment, so I cannot give you more information. thanks Chris -- Reinier Post TU Eindhoven
Re: [Crypt::SSLeay] compile problems on Solaris
On Wed, Nov 23, 2005 at 11:31:06AM +0100, Barden, Tino wrote: Hello, I have tried to compile Crypt-SSLeay-0.51 on a Solaris 9 machine and got the following errors: UZKT3 # perl Makefile.PL Found OpenSSL (version OpenSSL 0.9.8) installed at /usr/local/ssl Which OpenSSL build path do you want to link against? [/usr/local/ssl] [...] LD_RUN_PATH=/usr/local/ssl/lib gcc -G -L/usr/local/lib SSLeay.o -o blib/arch/auto/Crypt/SSLeay/SSLeay.so -L/usr/local/ssl/lib -lssl -lcrypto -lgcc The problem with compiling on Solaris is usually that a -Rdir has to be inserted for every -Ldir. So I'd guess hat inserting a -R/usr/local/lib on the gcc command line would fix the problem in your case. -- Reinier
Re: HTML::Parser bug
On Sun, Mar 20, 2005 at 01:51:25PM -0800, Bill Moseley wrote: On Sun, Mar 20, 2005 at 06:02:26PM +0300, [EMAIL PROTECTED] wrote: Hello libwww, using it to parse html-forms etc... noticed, that it recognizes strange comment like !-- as starting of the comment, not like the whole empty comment, as IE. Doesn't seem like that's a valid comment. http://www.w3.org/TR/WD-html40-970917/intro/sgmltut.html#h-3.1.4 Well, the HTML:Parser perldoc says: HTML::Parser is not a generic SGML parser. We have tried to make it able to deal with the HTML that is actually out there, and it normally parses as closely as possible to the way the popular web browsers do it instead of strictly following one of the many HTML specifications from W3C. Where there is disagreement, there is often an option that you can enable to get the official behaviour. But do all versions of IE parse this the same way? What do other popular user agents do? -- Reinier
Re: WWW::Mechanize caching
On Fri, Feb 25, 2005 at 10:42:43AM +1000, Robert Barta wrote: On Thu, Feb 24, 2005 at 11:07:00PM +0100, Reinier Post wrote: On Mon, Feb 21, 2005 at 08:27:38AM +1000, Robert Barta wrote: Hi all, I hope I did not miss an obvious solution to the following: I want a *caching* version of WWW::Mechanize. Why don't you just use a caching proxy server? Squid? First, we need a bit more control on the caching policy. Reconfiguring a squid remotely is a bit brittle :-) But, more importantly, we cannot assume that a proxy/cache is at every user site where the agent is running. I was assuming you'd put it on the client side. But perhaps Squid won't run there. -- Reinier
Re: WWW::Mechanize caching
On Mon, Feb 21, 2005 at 08:27:38AM +1000, Robert Barta wrote: Hi all, I hope I did not miss an obvious solution to the following: I want a *caching* version of WWW::Mechanize. Why don't you just use a caching proxy server? Squid? -- Reinier Post
Re: HTML::TreeBuilder/HTML::Parser - problem parsing tables
On Mon, Apr 05, 2004 at 04:29:50PM +0200, Neven Luetic wrote: I wrote a small application to collect samples of pages from sites to do some usability checking offline. So it's necessary that the archived pages match the original exactly, when displayed. As some tests on the pages are going to be automated using tags or attributes as search criteria and as it was necessary to rewrite any links to pictures inside the pages, I decided to use HTML::TreeBuilder for this. However, I encountered a critical difference of pages read using HTML::TreeBuilder-parse() for parsing and HTML::TreeBuilder-as_HTML for writing to the original: in several german newspaper sites, that are using big tables for their layout, some tables are closed too early by the parser. The effect is, that from that point onward the table-cells are displayed row by row (this is true for every browser I tried - mozilla, firefox, opera, ie6), while the original page looks ok. I tried setting HTML::TreeBuilder-implicit_tags(0) (this will be my default setting anyway), but it didn't change the behavior. So I suppose, the problem is not with some routine *adding* tags that are proposed to be missing, but with the parser itself, misinterpreting the tree. Does anybody have an idea about what the problem might be and how I could solve this? I can only rely as a former HTML::TreeBuilder user. I patched it a little to fix some of its behavbiour. HTML can be broken as SGML or XML. HTML::Parser and HTML::TreeBuilder were designed as heuristic parseers: they try to make sense of broken HTML. They even try to make sense of it in the same way that other applications (major browsers) do. According to the design, it transforms anything that vaguely looks like HTML into valid HTML that has the same effect. But this cannot be guaranteed in general, of course: different applications have different heuristics for dealing with broken HTML. Perhaps HTML::Parser's heuristics for dealing with broken tables are different from those of the browser your testing with. In that case it would be advisable to extend or modify the HTML::Parser heuristics so it can conform to what your browser does. Another possibility is to put in some custom-built prepcosessing that makes HTML::Parser do the right thing in your case. Ultimately the fault is with the original HTML pages, which should be fixed to at least be syntactically well-formed. If the pages your working on are well-formed HTML, you may be troubled by a more severe problem: HTML::Parser and HTML::TreeBuilder are expected to leave non-broken HTML exactly the way it is, but they don't always do so. There are problems with handling framesets; perhaps there are other problems. If you find any, they should really be fixed. I'm pretty stuck, as nearly a quarter of all (newspaper and magazine) sites tested have this problem, so that it renders the script virtually useless. Can you post a *minimal* HTML fragment that exhibits the problem? Greetings Neven -- Reinier Post TU Eindhoven
Re: Can't locate object method host via package URI::_foreign
On Tue, Sep 02, 2003 at 12:29:16PM +0400, Siddhartha Jain(IT) wrote: Sorry, the input being given to the $uri-host method was erroneous. Again, sorry for the false alert!! Comment: URIs exist that are valid, but do not have a host part, and you will have this problem then, so it is a good idea to use eval { $uri-host } if you haven't checked in advance that $uri contains one. WWW::Robot has this problem for instance. -- Reinier
Re: TreeBuilder cgi memory problems
On Fri, Aug 08, 2003 at 12:43:16AM +0100, John J Lee wrote: On Thu, 7 Aug 2003, [EMAIL PROTECTED] wrote: Having a potential TreeBuilder memory problem when using it to parse through a large HTML table ( 2K rows) where the memory allocation grows to about 20M on my server and never goes down even after finishing with the HTML and TreeBuilder structures. The Perl script runs as a CGI and Apache gives up after awhile with the following line in the error logs - Out of Memory !! [...] 20 Mb does seem a lot, but why would one expect the process memory usage to fall after parsing is comlpete? On most systems, memory used by a process and free'd isn't returned to the system until the process exits. Sorry, no actual help... Look at the DESCRIPTION section of the HTML::TreeBuilder perldoc, item 4: 4. and finally, when you're done with the tree, call $tree-delete() to erase the contents of the tree from memory. This kind of thing usually isn't necessary with most Perl objects, but it's necessary for TreeBuilder objects. See HTML::Element for a more verbose explanation of why this is the case. It may explain the problem. - Reinier
Re: Help needed
On Mon, Jul 14, 2003 at 11:10:26PM +0200, Carsten Kruse wrote: Hi Teddy, if you know about the structure of the html page you should try the functions of HTML:TokeParser package. [very nice example omitted here for brevity] I have used HTML::TreeBuilder in some of my scripts. If the HTML is well-structured, an alternative is to use XML packages; XML::LbnXML can read and write HTML syntax and allows you to manipulate the structure with DOM operations. -- Reinier
Re: HTML parsing
On Tue, Jul 08, 2003 at 03:34:12PM +0100, Richard Lamb wrote: Hi folks, I'm Richard Lamb, and I'm a Perl virgin. Just getting to know the language. I'm in the midst of an MSc in Computing in Manchester (UK), working out a means of stripping HTML tags (via the DOM interface, which I'm trying to get to grips with) and reformatting text, so as to improve a Web site's accessibility (particularly the visually impaired). Are there any PPMs you'd recommend I check-out? I have only tried HTML::TreeBuilder (not DOM, but the same principle; uses heuristic HTML parsing and patching that does some unwanted things) and XML::LibXML (which has the advantage of using the libxml2 library that other languages also bind to; supports a lot of DOM plus some extensions). There are many more XML libraries, some with DOM in their names, but they always fail to install on my Solaris system. Here is my first XML::LibXML script, a HTML reformatter: #!/usr/bin/env perl use XML::LibXML; my $parser = new XML::LibXML; $parser-validation(1); $parser-expand_entities(0); $parser-keep_blanks(0); $parser-pedantic_parser(1); $parser-expand_xinclude(0); foreach my $srcdoc (@ARGV ? @ARGV : ('-')) { my $doc = $parser-parse_html_file($srcdoc); print $doc-toStringHTML(); } # end of script Cheers, Richard. Enjoy, -- Reinier
Re: Help! how is this called?
On Thu, Nov 28, 2002 at 12:16:19PM -0700, Keary Suska wrote: on 11/27/02 7:54, [EMAIL PROTECTED] purportedly said: RE: Help! how is this called?Thank you but this won't help me I guess. I could find that info only from within the script, right? Well, I want to create a program like that Teleport Pro from Windows that spiders a web site and download all the pages from the site. To download the pages is very easy, but the biggest problem is to create the local file names, and to replace all the links from the downloaded pages to make them work locally. Until now, the only problem I found, is that I can't reliably find the file name from the path in all the cases. I have written a couple of programs that do this. You don't really need to know a file name, but you do need to weed out duplicates, e.g. [...]/foo/ is often identical to [...]/foo/index.html. Well, yes and no. The example URL provided: http://www.site.com/script.cfm/dir1/dir2/http://www.site.com/file.html is technically a malformed URI. According to RFC 2396, it isn't. A : is allowed anywhere in the path, and a // is allowed to appear multiple times as well, as far as I can see. (The / characters separate segments, and segments can be empty.) Som browsers (at least IE 6 and links) misparse such URLs, but they have no excuse, as far as I can see. It should be: http://www.site.com/script.cfm/dir1/dir2/http:%2F%2Fwww.site.com%2Ffile.html or minimally: http://www.site.com/script.cfm/dir1/dir2/http:%2F%2Fwww.site.com/file.html That would mean you'd have to rewrite the URLs that point to them from other documents so they won't break. You will always find that sites do stupid things, and will have to find ways around them. However, the case of extra PATH_INFO or query strings, it doesn't hurt to treat them as they are, and you will be successful most of the time. Other than issues with the URI above, you should have minimal problems. Right now I have the problem that Apache 2 won't feed URLs to script.php (in my case it's a PHP script) if they have an extra path. But this is just one of my regular quarrels with the Apache configuration file mess, I expect it can be done somehow. -- Reinier Post TU Eindhoven
Re: can't load host method in URI package
On Wed, Aug 28, 2002 at 01:08:41PM -0400, Thurn, Martin (Intranet) wrote: LWP::RobotUA=HASH(0x83d7994) GET 1 ...Can't locate object method host via package URI::_generic (perhaps you forgot to load URI::_generic?) at That's the message you get when your URL does not have something like 'http:/' in front of it. WHY DON'T YOU PRINT OUT THE VALUE OF THE VARIABLES IN YOUR PROGRAM? THAT'S THE FIRST STEP IN DEBUGGING. GO BACK TO 8TH GRADE PROGRAMMING CLASS. YOU WOULD SEE IMMEDIATELY THAT WHAT YOU'RE TREATING AS A URL IS NOT A URL. LWP already saw it, but it kept its knowledge to itself. A better error message would help here ... -- Reinier
Re: can't load host method in URI package
On Fri, Aug 09, 2002 at 03:01:54PM +0500, Ken Munro wrote: Hi. I am trying to write a simple robot that reads urls from a text file. The source is listed below. I am getting an error that says: LWP::RobotUA=HASH(0x83d7994) GET 1 ...Can't locate object method host via package URI::_generic (perhaps you forgot to load URI::_generic?) at /usr/lib/perl5/site_perl/5.6.1/WWW/RobotRules.pm line 187, IN line 2. I have search Google far and wide, and have found other people with this problem, but no solution. This is from memory and refers to old code, but ... I think I saw that problem when I was using WWW::Robot over a year ago. I remember fixing a problem with LWP::RobotUA crashing on unusual URLs. I also removed some hardcoded limits on the kinds of URLs WWW::Robot traverses. If you're interested in trying the modified modules, they are available at http://www.win.tue.nl/~rp/perl/lib/LWP/RobotUA.pm http://www.win.tue.nl/~rp/perl/lib/WWW/Robot.pm -- Reinier
Re: Fw: Can't navigate to URL after login
On Tue, Aug 06, 2002 at 10:26:01AM -0500, Kenny G. Dubuisson, Jr. wrote: Tried that (referer = ...) with no luck. I did find that I can navigate several pages in using sequential $browser-request calls but the page that finally fails has the hyperlinks to the next page calling javascript functions. Maybe that has something to do with it. Just a guess at this point. Thanks, Kenny Most definitely. LWP does not include Javascript support. -- Reinier
Re: libwww only as root
On Wed, Jul 17, 2002 at 12:44:39PM -0600, Keary Suska wrote: on 7/17/02 12:13 PM, [EMAIL PROTECTED] purportedly said: If you're having a problem with running your scripts from cron, the answer is usually in your PATH environment variable or working directory - cron tends to run with different paths, and your script probably can't find libraries or other things it needs. Except that Perl does not rely on PATH and related variables to determine module or loadable (.so) locations. $PERLLIB The cron problem could be permissions or a CWD problem which would effect finding custom modules, if that is even an issue, or several other reasons. If there was some information about what is going wrong, perhaps a more sensible solution could be presented. Run a cron job with the command env /tmp/env . Then set your environment to exactly that. (If you're using bash or some other sh-derivative, just . /tmp/env should do the trick.) Then debug the errors you get. -- Reinier
Re: Fetching big files
On Wed, May 29, 2002 at 06:20:58PM +0300, evgeny tsurkin wrote: Hi! The problem I have: I am fetchng big files from the web that are created on the fly. Befor actually fetching them I would like to know what is the size it is going to be. I am not sure that is possible ,but if it is - please be very clear i am new in using lwp. Thanks. Use your HEAD. (It comes with LWP!) The 'head' method only fetches the document headers; one of them gives the document size. -- Reinier
Re: Authentication
On Fri, Mar 22, 2002 at 05:05:49PM -0600, Damian Kohlfeld wrote: I have a situation where I have a webpage that has a list of links to other web sites. They do the following: Login to my website. My website sends assigns them a cookie using libwww perl. They see the list of links and click one. The web page pointed to can check the cookie and see if it is valid, thus, granting them access. The whole point is that I want to authenticate poeple so that they can visit the links on my page, but, I want to keep them from visiting the links directly. Is this possible? Yes, why not? If they already have a cookie, let them in; if not, redirect them to the login page. -- Reinier
Re: hi
On Tue, Mar 05, 2002 at 06:54:41AM -0800, Randal L. Schwartz wrote: Reinier == Reinier Post [EMAIL PROTECTED] writes: Reinier On Mon, Mar 04, 2002 at 04:33:37PM +0530, kavitha malar wrote: I want to search a text in a website how to do that through perl. Reinierperl -MLWP::Simple -e \ Reinier 'getprint http://www.google.com/search?q=$word+site:$site' Reinier I'm serious. (This is what I use to find my own pages.) Except now, Google has gotten fairly upset about automated page fetches. There's a thread on use.perl.org about it. Thanks for the pointer. http://www.google.com/terms_of_service.html is pretty vague about it. As someone in the thread remarked, we are talking about a single query here, for personal use, without even any reformatting of the results. And last time I checked, Google *specifically* blocks the default agent type that LWP uses, so you'll get no response. You have to change the agent type to something with Mozilla in it. :) Mmm, I should have checked that. I actually feed the Google query URL to lynx or links. Gisle - would it be unfair to have a special useragent string when LWP detects that it is visiting Google? :) Nice idea :) But hidden magic in code is always bad. -- Reinier
Re: hi
On Mon, Mar 04, 2002 at 02:57:30PM +0530, kavitha malar wrote: perl -MLWP::Simple -e 'getprint http://wwwyahoocom;' 400 Bad Request URL:http://wwwyahoocom anybody knows why this error is happening It isn't here Try setting $http_proxy or something --jude -- Reinier
Re: installing libwww on solaris
On Thu, Feb 14, 2002 at 02:47:19PM +0200, Afgin Shlomit wrote: I try to install libwww on solaris and first the 'make test' dont past okay - I get : robot/uaPerl lib version (v5.6.1) doesn't match executable version ( 5.00503) at /usr/local/lib/perl5/5.6.1/sun4-solaris/Config.pm line 18. This is not a LWP specific problem. It means you're mixing references to the old and new Perls. It is possible to use your own set of Perl modules with an existing Perl installation - I have done this on Solaris. It is also possible to do the reverse: install your own new Perl and use libraries of the previous Perl installation with it; I have done this on Solaris, too. But in the long run, the cleanest approach is to install a completely separate version of Perl and Perl modules that do not refer to any preexisting Perl installation. There are many places where these references can be set (Perl Configure, CPAN config, $PERLLIB variable, etc.) so you have to be careful. Documentation is in perldoc ExtUtils::MakeMaker and other places. I like to use the CPAN shell, configure it to use its own location for everything Perl, then reinstall the CPAN module, restart it, and reinstall Perl itself with it. But there are many different methods. Then when you want to move it to /usr/local, redo the installation from scratch. My installation notes are here: http://wwwis.win.tue.nl/~rp/perl-install/ -- Reinier
Re: Double slash in a URI: legal or not?
On Sun, Jan 06, 2002 at 08:41:54AM -0800, Randal L. Schwartz wrote: Hans == Hans De Graaff [EMAIL PROTECTED] writes: Hans RFC 2396 seems to indicate that in path segments only a single slash Hans is legal, I'm not sure where you get that. My reading of the BNF: abs_path = / path_segments path_segments = segment *( / segment ) segment = *pchar *( ; param ) implies that a segment can be null, so abc//def is abc def in terms of path steps, and is thus *not* equivalent to abc def. Sure, you can't have two slashes next to each other and have it mean *nothing*, but it does in fact form a legal URI and a server can possibly ignore it or do something different with it or report that this particular resource is not found. Not only that, adjacent /es aren't even removed in relative URL processing, according to RFC 1808; the path transformation rules, see e.g. http://deesse.univ-lemans.fr:8003/Connected/RFC/1808/18.html do not match // within a path ('a complete path' must not be empty). This is contrary to Unix file path semantics, which so treat any sequence of /s as equivalent to one. -- Reinier
Re: Minor bug in request()
Why? I know this has been argued extensively elsewhere; see e.g. http://pppwww.ph.gla.ac.uk/~flavell/www/post-redirect.html Possibility 1 mentioned there is common enough to add support for it. -- Reinier This link goes nowhere -- is the site down? The correct link is http://ppewww.ph.gla.ac.uk/~flavell/www/post-redirect.html I tried to verify it, but couldn't at the moment of posting. Lesson: don't post in such cases. -- Reinier
Re: ODBC to MS SQL 7/2000
On Thu, Sep 20, 2001 at 05:50:39PM -0400, Hawk wrote: Hi, I have been assigned the task of writing perl scripts from a Linux box to connected to a MS SQL 7/2000 server. Are there routines and modules already built for this? Yes. % perl -eshell -MCPAN cpan shell -- CPAN exploration and modules installation (v1.59_54) ReadLine support enabled cpan i /SQL/ CPAN: Storable loaded ok Going to read /home/rp/.cpan/Metadata Database was generated on Sat, 22 Sep 2001 00:01:30 GMT [...] Module MSSQL::DBlib(S/SO/SOMMAR/mssql-1.008.zip) [...] cpan i /ODBC/ DistributionJ/JM/JMAHAN/iodbc_ext_0_1.tar.gz DistributionJ/JU/JURL/DBD-ODBC-0.28.tar.gz Module DBD::ODBC (J/JU/JURL/DBD-ODBC-0.28.tar.gz) Module RDBAL::Layer::ODBC (B/BR/BRIAN/RDBAL-1.2.tar.gz) Module Win32::ODBC (Contact Author Dave Roth [EMAIL PROTECTED]) Module iodbc (J/JM/JMAHAN/iodbc_ext_0_1.tar.gz) 6 items found cpan I haven't used any of it, but it's there. -- Reinier
Re: problems installing the modules
On Thu, Aug 23, 2001 at 01:46:47PM +0300, Yair Lapin wrote: Hi, I'm trying to install the libwww modules in a sparc server with solaris 2.8 and the most of them I can't compile I get the following Error message: cc -c -xO3 -xdepend-DVERSION=\3.25\ -DXS_VERSION=\3.25\ -KPIC -I/usr/perl5/5.00503/sun4-solaris/CORE -DMARKED_SECTION Parser.c cc: unrecognized option `-KPIC' cc: language depend not recognized -xdepend -KPIC is an option to Sun cc. Are you sure your cc is Sun's? If not, you have to regenerate the Makefile for the C com piler you're using (probably gcc). -- Reinier
Re: Question
On Sun, Jul 08, 2001 at 04:03:46PM -0700, Jason Whitlow wrote: I am trying to get one of my apps to display only 5 records at a time. With Perl attaching to a mysql database. Does anyone have any good Ideas of how to do this. Yes, Perl can do this (check the DBI documentation), and the SQL SELECT statement can do this, too (check the mySQL documentation, www.mysql.com). Your question is off-topic for this mailing list, which is about LWP, the WWW library for Perl. -- Reinier
Re: LWP::RobotUA-recurse?
On Fri, Jun 29, 2001 at 10:44:20PM +0200, Simon Dang wrote: Hi, I am a newbie with LWP. Does LWP::RobotUA run recursively by default? If not, is there method that I can call within UA to set this to run recursively? I have searched the docs within LWP::RobotUA, but there is nothing mentioned about recursive searches. Try WWW::Robot, which is an inbterface on LWP::RobotUA to do just that. -- Reinier
Re: redirects and javascript
On Mon, Jul 02, 2001 at 08:51:42AM -0400, fred whitridge wrote: I have inelegantly solved my problem by loading the page with the javascript reference into Excel and then snagging the executed result. There has to be a better way to do this, altho' this one works. LWP doesn't support Javascript, but you can + do an ad-hoc 'parse' of the Javascript code in question, if you know what they look like, using regexps + check the mailing list archives (topic has been discussed before) -- Reinier
Re: HTML::Parser - Extracting out the text from body
On Mon, Jul 02, 2001 at 11:17:00AM -0700, Bill Moseley wrote: Hello, I need to extract text out of html docs to do search word highlighting in context. (You know, like google's output.) So, is there a fastest method to do this -- better than just using HTML::Parser, setting a flag when I catch body and then storing the text? If 'fastest' means 'most convenient', try perl -MLWP::Simple -MHTML::TreeBuilder -e \ 'print HTML::TreeBuilder-new-parse(LWP::Simple::get(http://www/;))-as_text' -- Reinier
how to disable automatic redirect (was: Newbie Question)
On Tue, May 15, 2001 at 11:48:54AM -0400, Jean Zoch wrote: Hello all, I am developing a utility that needs to grab the HTML code from web pages. To do this I am using: my $url = 'http://www.theURLiWant.com'; use LWP::Simple; $content = get($url); This works great, but I also need the *actual* URL of the content that is returned. Often, the content does not come from $url, but from a redirect. I need this so that I can add a BASE HREF=$url to the code so that the web page will work even though it is being displayed on my server. You can disable automatic following of redirects. The perverse (author's wording) and quite popular way of doing this is described at http:[EMAIL PROTECTED]/msg0.html See also perldoc LWP::UserAgent I have tried using LWP::UserAgent, and getting the headers returned from the web page, but all that gives is something like: HTTP::Headers=HASH(0x10205064). Any suggestions? That's the printable representation of a Perl object holding the headers. You want to extract the object's fields, presumably by calling HTTP::Headers methods. See perldoc HTTP::Headers BTW please use the subject of your message to indicate the subject of yuour message. Thanks. -- Reinier
Re: Automated FORM posting
On Wed, May 16, 2001 at 12:52:04PM +0100, D.D.Casperson wrote: Hi I am new to perl, so I would appreciate a verbose response to this, any references would be great. I am playing around with HTML FORM's and it had been suggested to me that the libwww might be the answer to my problem. I want to generate a client that automaticly fills out a form and posts the details to the server. For a given form, or for a large range of forms unknown in advance? It's possible to do the latter, to some extent. How would I go about writing a perl script that accomplished the same as the HTML below? You mean, the same as a user filling out a form with a browser and clicking submit? For example if I filled out my message as hello, this is a test, and my name as Dominic, and then clicked the Go button. Is it possible to write a perl script so that the server couldn't tell the difference between a POST from that script and a person filling out the web page. Yes. The POST utility distributed with LWP supports this, you can read its code to see how it's done. The LWP mailing list archives have many postings on this issue. You can write a parser for forms (using HTML::Parser or modules that depend on it) that parses a form to find the form fields and then submit the form using values you pick somehow. LWP doesn't support Javascript, so the forms have to be without Javascript. FORM NAME=testForm METHOD=POST target=_topACTION=http://someserver.com/SendForm.htm; TEXTAREA NAME=Message wrap=no rows=5 cols=40 /TEXTAREA INPUT type=text NAME=Name size=20 INPUT TYPE=button VALUE=Go onClick=submit() /FORM -- Reinier Post
Re: LWP::RobotUA problem
On Tue, Apr 24, 2001 at 09:47:21AM -0700, Gisle Aas wrote: 234c234,235 my $netloc = $request-url-host_port; --- my $ru = $request-url; my $netloc = $ru-can('host_port') ? $ru-host_port : $ru-host; Not all URIs have a 'host' method either. I think simply making it: $netloc = eval { $ru-host_port }; should do. If eval{}ing arbitrary URIs is safe ... what happens on the 'URI' http://$usersuppliedvalue/ ? I'd have to check this particular case ... LWP promise in general to avoid exploits of this nature? But then we have the $SIG{__DIE__} stupidity which makes it: $netloc = eval { local $SIG{__DIE__}; $ru-host_port }; That's nice enough, if eval{} really doesn't lead to exploitable URIs. -- Reinier
Re: considering HTML::Element's $tree-extract_links
On Sat, Feb 24, 2001 at 05:11:02PM -0700, Sean M. Burke wrote: Some clever person wrote me earlier this month and suggested adding a feature to HTML::Element's extract_links method; and I want to run it past people who actually use the current method's behavior. Count me in. What the person who wrote to me suggested was this: make each item in the returned array contain not two subitems (attribute_value, $element), but THREE: (attribute_value, $element, attribute_name). I think this is a wonderful idea. Yes, definitely! In fact, I'd be happy to get just the element. For anyone who uses extract_links, I'm asking: would any of your code break if I added a third value to each sublist returned? Mine won't. -- Reinier
Re: / and DirectoryIndex
On Wed, Feb 21, 2001 at 04:42:20PM +0700, John Indra wrote: Hi all... How do I tell my user-agent (an LWP::UserAgent object) to NOT download both / and index.html or whatever remote sites DirectoryIndex set to? Example, my user-agent sees 2 link: - http:://www.domain.com/ This :: notation is contagious :-) - http:://www.domain.com/index.html IF in this situation both link to the same document, my user-agent will be a fool if it tries to download both file. How do I make a "smarter" user-agent that will know that those 2 links are the same and only perform one GET method, either to http:://www.domain.com/ OR http:://www.domain.com/index.html? The server won't tell you whether or not they're the same document. You have the same problem with server aliases or symlinks: the whole tree http://www.domain.com/a/butreally/b/* may be identical to http://www.domain.com/b/* Depending on what you find on the server it may be possible to hypothesize some heuristics, for instance, '*/index.html always has the same content as */', but exceptions are always possible. The only way to be really sure is to check the document content, or at least the header. -- Reinier
Re: Off topic question
On Mon, Jan 22, 2001 at 08:57:20AM -0800, [EMAIL PROTECTED] wrote: I know this is off topic, but can some perhaps point me to a resource online that shows how you can load a perl module into your local cgi-bin and use it locally. I'm running into a case of a host admin that refuses to install some modules for some of our software. It would be a lot easier if I could provide instructions for people that want to install our software if the module is missing and the admin is uncooperative. Well, the basic idea is, set $PERLLIB to the installation location, both at installation time and at use time. At use time you can also use use lib libdir -- Reinier
Re: Install, Again
On Tue, Jan 09, 2001 at 09:00:58AM -0500, Alliance Support wrote: # perl -e "use LWP::Proxy" Can't locate LWP/Proxy.pm in @INC (@INC contains: /usr/local/lib/perl5/5.00502/sun4-solaris /usr/local/lib/perl5/5.00502 /usr/local/lib/perl5/site_perl/5.005/sun4-solaris /usr/local/lib/perl5/site_perl/5.005 .) at -e line 1. BEGIN failed--compilation aborted at -e line 1 I really don't understand what the test is doing other than looking for a file LIB/Proxy.pm. There is a directory name LWP but no Proxy.pm file, lots of others. So install the LWP::Proxy module. The standard way of doing this is by typing # perl -eshell -MCPAN cpan install LWP::Proxy [...] cpan quit # If you can't touch the set of libraries installed as root, it is possible to install your own set, or even your own version of Perl, from the same interface. This is not very well docum,ented though, I just lost a few hours because I didn't remember all the details from last time. Basically, you need to set $PERLLIB to where you want your own libraries installed, then read the ExtUtils::MakeMaker and CPAN manpages for the proper values of the CPAN configuration variables, then copy CPAN/Config.pm to $HOME/.cpan/CPAN/MyConfig.pm, edit it to contain the correct values, and 'perl -eshell -MCPAN' will work. -- Reinier
Re: problems with LWP::UserAgent
On Wed, Dec 06, 2000 at 04:38:48PM -0800, Gisle Aas wrote: meta http-equiv="Refresh" content="0; URL=/2000/11/02/" Contrary to what you seem to believe, this is not a HTTP redirect. It isn't handled by the redirect_ok setting. I don't think LWP offers supportfor automatic refreshes. LWP will let HTML::HeadParser look at the HTML it receives, so these meta elements actually end up as HTTP headers. We might try to deal with: Refresh: 0; ... as if it was a normal 3xx-redirect. This would be nice, if documented e.g. for redirect_ok. If the number is something else than 0 then the page should simply be returned as now. It would be nice to also have the option to have it refreshed anyway. It would even be possible to refresh after the specified # of seconds, with sleep(). refresh_ok ? refresh_immediately_if_faster_than(10) ? --Gisle -- Reinier Post TU Eindhoven
Re: problems with LWP::UserAgent
On Thu, Dec 07, 2000 at 12:54:00AM +0200, [EMAIL PROTECTED] wrote: Hi, can you help me to desolve this problem? is where I have mistake in follow script? exist A.html and B.html pages. how can I get the source of B.html page if page A.html redirect to B.html like in follow HTML example? !doctype html public "-//w3c//dtd html 4.01 transitional//en" html head meta http-equiv="Content-Type" content="text/html; charset=windows-1251" meta http-equiv="Refresh" content="0; URL=/2000/11/02/" Contrary to what you seem to believe, this is not a HTTP redirect. It isn't handled by the redirect_ok setting. I don't think LWP offers supportfor automatic refreshes. -- Reinier Post
Re: URI::Heuristic
On Fri, Nov 24, 2000 at 07:32:03PM +0200, Doru Petrescu wrote: tryed to email "[EMAIL PROTECTED]" but I got an user not found" SMTP error :( hope this is the right email address ... -- Original message -- Subject: URI::Heuristic Hi, I was playing with the URI::Heuristic module, and I have a suggestion ... when guessing the host part of a URI, befor trying things like www.ACME.MY_COUNTRY www.ACME.com www.ACME.org ... isn't normal to first try to resolve that ACME string ? maybe it is a host in my OWN LOCAL DOMAIN ... I haven't looked at the code, but I can guess the reason: it's desirable to limit URI::* to doing pure string manipulation, without any dependence on DNS lookups or actual document lookups over HTTP, etc. This limits the amount of guessing it can do, but it won't rely on the availability of a network connection. Depending on the circumstances of use,DNS lookups may be slow or completely inoperational. Perhaps you can implement a URI::Heuristic::Gethostname to do what you want? lynx on the other hand does another stupid thing ... it tries: 1. rdsnet.ro ... fails 2. www.rdsnet.ro.com ... fails 3. www.rdsnet.ro.org/net/mil ... etc ... all of them fail ... 4. host not found ... (but OFC, www.rdsnet.ro exists and is up and alive, too bad no one ask for his name ...) what do you say ... ? Send a bug report to lynx-dev :) -- Reinier Post TU Eindhoven
Re: libwww-perl install
On Wed, Nov 15, 2000 at 09:30:12AM +0100, Bence Fejervari wrote: Hi! Yesterday I tried to install the libwww-perl package from .tar.gz file, but when I made make test, it gave me 16 errors out of 22. I attached all the output information. LWP depends on some other libraries that you need to install first. My advice is to use % perl -eshell -MCPAN and let the CPAN shell do this for you automatically. cpan install Bundle::LWP -- Reinier
Re: One Doubt !!!
On Mon, Nov 06, 2000 at 11:43:47PM +0530, Vasu Balla wrote: Can Any body clarify my doubt ... Is it possible to save all images to disk which are requested by a client i.e., any browser . so that they can be manipulate them then send them to browser ... Are there any modules ... useing which , we can do this job ... Definitely; I use LWP and Image::Magick to dynamically manipulate images at the time they are requested by the browser. You'll need a detailed plan on how your software is going to work though, and that is something you'd better design by yourself. -- Reinier Post
Re: Perl script
On Mon, Nov 06, 2000 at 01:58:53PM -0500, Dahlman, Don P. wrote: Not sure if this pertains to this mail list or not, but I will try. It doesn'. LWP is a software library for client-side use of the Web in Perl. However in the last six months the script was spammed by mass amounts of off domain calls. The script evidently was able to be called with the proper parameters attached on to the script call. You can probably shut out certain clients (by IP number or other characterertics) in such a way that that specific caller is denied access and not too many others are denied as well. See your webserver software's documentation for details. -- Reinier post
Re: Logging on to a Website using libwww
On Thu, Oct 26, 2000 at 10:39:57AM +0200, Dirk Treusch wrote: Dear list members, I would like to log in to a web site from Perl and return the URL and content which the server returns after the login. With the code below I have been successful in filling simple forms on some websites. However I have not been able to log in to http://www.finanztreff.de Your Perl code looks fine, but when I look att he source of this page, I see: Form NAME="LOGIN" action="/ftr/steuer/user_steuer.htm" method=post onSubmit="return LoginwithCookie();" I haven't studied the details, but you probably need to add cookie handling on the LWP end. This is supported. -- Reinier Post
Re: filtering uploaded files
On Mon, Aug 28, 2000 at 10:58:52PM +0200, Gisle Aas wrote: Ian Duplisse [EMAIL PROTECTED] writes: I am uploading files via HTTP::Request::Common::POST, but would like to modify the data that is actually uploaded to the webserver "on the fly", such as with a search and replace. How can that be done, short of making a copy of my original file that has the desired changes? Something like this should work: #!/usr/bin/perl use HTTP::Request::Common qw(POST); my $file = `cat stuff.txt`; # slurp a file $file =~ s/foo/bar/g;# modify it my $req = POST('http://foo.com/', Content_Type = 'form-data', Content = [ foo = $bar, file = [ undef, "stuff.txt", Content_type = "text/plain", Content = $file, ], ], ); print $req-as_string; use LWP::UserAgent; $ua = LWP::UserAgent-new; my $res = $ua-request($req); __END__ This is incompatible with the file upload used by my Netscape 4.7 on Win98 browser, as understood by HTTP::File::upload. I.e. the above script will result in empty upload results with HTTP::File::upload at the receiving end, bcause the file content is transmitted in a different way. Does LWP also support that other method (multi-part/form-data, every form attribute being a separate part)? Regards, Gisle -- Reinier Post [EMAIL PROTECTED]
Re: Extending HTML-Parser
On Tue, Oct 10, 2000 at 12:35:48PM -0500, [EMAIL PROTECTED] wrote: Gisle Aas suggested I send the following patch to this libwww mailing list to see if any of you have any comments. The feature we've been talking about adds functionality to HTML::Parser to allow it to parse ASP- and/or JSP- style tags. Specifically, we can now configure patterns to specify regions of the input tags which should be handed off to special handlers to handle, e.g. the following: The conditional is % if ($conditional) { % blinktrue/blink % } else { % blinkfalse/blink % } % I use HTML::Mason to do this, would it be possible to unite forces? Having one parser for this mechanism could benefit both development (you) and users (me). -- Reinier Post
Re: Redirects with javascript
On Thu, Sep 28, 2000 at 06:07:45PM -0300, Anderson Marcelo wrote: Please, How make for the "result.html" content the redirect of the "index.html" ?? The page (index.html) content this: script window.location="/test.shtml"/script LWP doesn't include a Javascript engine; you'll need one to get this to work. -- Reinier
Re: HTTP redirects
On Thu, Sep 07, 2000 at 04:07:31PM +, Jarrett Carver wrote: Is there a way to tell if your request has been redirected? i.e is_redirect? perldoc HTTP::Response Look at the previous() method. HTTP return codes are defined in http://www.w3.org/Protocols/rfc2068/rfc2068 -- Reinier
Re: how to convert
i am using, perl -e s/\\r/\\$/g filename, it's not working. please suggest the me. You're not asking for this, but please be aware what you're doing: - your question is totally unrelated to LWP - programming is not a trick, but a profession. complain to your boss if you're doing this for work - learn Perl! there are some good books and sites for beginners - you could have used the Web to solve this, e.g. it took me 10 seconds to find http://userpage.chemie.fu-berlin.de/~winkelma/Misc/Perl/Scripts/Dos-Unix/ - try perl -0pe 's/\r\n/\n/g' filename converted -- Reinier Post
WWW::Robot crashing upon encountering file:// URLs
Hello list, WWW::Robot spiders crash on on file:// URLs, due to the following problem in LWP::RobotUA: % perl -MLWP::RobotUA -e 'my $ag=new LWP::UserAgent("bugexposer/0.1","reinpost\@win.tue.nl");$ag-delay(0);printf "%s\n",$ag-request(new HTTP::Request(GET,"file://localhost/home/rp/.cshrc/"))-content' Can't locate object method "host_port" via package "URI::file" at /usr/local/lib/perl5/site_perl/5.005/URI/WithBase.pm line 48. I've just taken my first steps with this software (still trying to figure out why WWW::Robot will only produce HTML URLs despite what the documentation suggests), but this is a clear problem, that seems to call for a patch. A stopgap patch is attached. -- Reinier Post [EMAIL PROTECTED] --- RobotUA.pm.orig Sun Aug 6 18:08:56 2000 +++ RobotUA.pm Sun Aug 6 15:56:47 2000 @@ -231,7 +231,8 @@ HTTP::Status::RC_FORBIDDEN, 'Forbidden by robots.txt'; } -my $netloc = $request-url-host_port; +my $ru = $request-url; +my $netloc = $ru-can('host_port') ? $ru-host_port : $ru-host; my $wait = $self-host_wait($netloc); if ($wait) {
Re: last_modified problem, help needed
but, if it's a ".php" file, $res-last_modified get nothing, what's wrong? A .php URL typically points to a PHP script that generates its output on the fly; therefore, a Last-Modified header would be quite useless. -- Reinier Post
Re: Newbie to the list
On Mon, Jul 31, 2000 at 11:57:00PM -0700, [EMAIL PROTECTED] wrote: [...] I have written one spider using the sockets library, but its not as robust as I would like. So now I'm exploring the LWP and HTTP modules as a way of bringing our spider upto date. I happen to be looking at the WWW::Robot CPAN module right now, and my questions are similar. Can you say how your spider compares to WWW::Robot? Is it more advanced? -- Reinier Post
Re: HTML::Entities module
On Thu, Jun 22, 2000 at 09:52:18AM +, marc-andre sauve wrote: Hi, Looking for HTML::Entities perl module % perl -e shell -M CPAN cpan install HTML::Entities cpan quit % perldoc HTML::Entities -- Reinier
Re: Problem deleting nodes with HTML::Element
You are deleting nodes from the tree while traverse-ing it. From running this code on this sample I still have the Fifth and Seventh ps in there. The documentation specifically warns against this. Last month I posted to this list a patch to make it possible (all it took was a one-line change in traverse) but the maintainer didn't accept it, and also completely reimplemented traverse; I haven't studied if it would be as easy to change in the new implementation. So mark or collect the nodes for deletion, delete them in a second pass. -- Reinier Post
Re: Help!
Question: Is there any way I can set the absolute url of the response to http://165.21.42.93/7000OneNumber so that the redirection is successful. Yes, use the abs() method, described in perldoc URI -- Reinier
Re: FRAME support in HTML::TreeBuilder
How easy would it be to change HTML::TreeBuilder to preserve the structure of framed pages? Not terribly, but I'll give it a try. In the meantime, yes, try moving things around to repair the tree. Presumably it's just a matter of finding the body, finding the frameset under that, moving it up to be body's sister, and then demoting body to... be inside the noframe element inside the frameset (or make one if none there?). OK thanks, I'm trying that and it seems to work, but I haven't tested very well. -- Reinier
Re: patch for HTML::Parser 3.06 to fix declaration commenthandling
On Wed, Mar 08, 2000 at 10:21:17PM -0500, la mouton wrote: this is what I experienced also. Comments like "! row1 --" get treated like comments by browsers and HTML::Parser should behave the same way. In other words, HTML::Parser should parse not HTML, but what some browsers think HTML is. -- Reinier
Re: MULTI FORM submission
On Mon, Jan 24, 2000 at 01:11:46PM +1100, Shao Zhang wrote: Hi, I have sent to [EMAIL PROTECTED] I thought this list is for discussions of using perl modules to interact with the web. Am I wrong? Yes and no. This list is for discussions of a specific Perl module: LWP. -- Reinier Post