Re: [CODE4LIB] calling another webpage within CGI script

2009-11-24 Thread David Pattern
Hi Ken

Are you behind a web proxy server or firewall?  If so, you'll probably need to 
specify a proxy server in the script.

If the proxy is defined in the environment variables on the server, then you 
can use...

  my $ua = LWP::UserAgent-new( timeout = 60 );
  $ua-env_proxy();

...otherwise, you might need to hardcode it into the script...

  my $ua = LWP::UserAgent-new( timeout = 60 );
  $ua-proxy(['http'], 'http://squid.wittenberg.edu:3128');

(replace squid.wittenberg.edu:3128 with whatever the proxy server name and 
port number actually are)

regards
Dave Pattern
University of Huddersfield


From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Ken Irwin 
[kir...@wittenberg.edu]
Sent: 23 November 2009 19:41
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] calling another webpage within CGI script

Hi Joe,

That's really helpful, thanks.
Actually finding out what the error message is nice:

HTTP Error : 500 Can't connect to www.npr.org:80 (connect: Permission denied)

I've tried this with a few websites and always get the same error, which tells 
me that the problem is on my server side. Any idea what I can change so I don't 
get a permission-denied rejection? I'm not even sure what system I should be 
looking at.

I tried Vishwam's suggestion of granting 777 permissions to both the file and 
the directory and I get the same response.

Is there some Apache setting someplace that says hey, don't you go making web 
calls while I'm in charge?

(This is a Fedora server running Apache, btw).

I don't know what to poke at!

Ken


---
This transmission is confidential and may be legally privileged. If you receive 
it in error, please notify us immediately by e-mail and remove it from your 
system. If the content of this e-mail does not relate to the business of the 
University of Huddersfield, then we do not endorse it and will accept no 
liability.


Re: [CODE4LIB] calling another webpage within CGI script

2009-11-24 Thread Greg McClellan

Hi,

I had a similar problem a while back which was solved by disabling 
SELinux. http://www.crypt.gen.nz/selinux/disable_selinux.html


-Greg


Re: [CODE4LIB] calling another webpage within CGI script - solved!

2009-11-24 Thread Graham Stewart

Hi,

We run many Library / web / database applications on RedHat servers with 
SELinux enabled.  Sometimes it takes a bit of investigation and  horsing 
around but I haven't yet found a situation where it had to be disabled. 
 setsebool and chcon can solve most problems and SELinux is an 
excellent enhancement to standard filesystem and ACL security.


-Graham

--
Graham Stewart
Network and Storage Services Manager, Information Technology Services
University of Toronto Library
130 St. George Street
Toronto, Ontariograham.stew...@utoronto.ca
Canada   M5S 1A5Phone: 416-978-6337 | Mobile: 416-550-2806 | 
Fax: 416-978-1668


Ken Irwin wrote:

Hi all,

Thanks for your extensive suggestions and comments. A few folks suggested that 
SELinux might be the issue. Tobin's suggestion to change one of the settings 
proved effective:
# setsebool -P httpd_can_network_connect 1.

Thanks to everyone who helped -- I learned a lot.

Joys
Ken

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Greg 
McClellan
Sent: Tuesday, November 24, 2009 10:04 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] calling another webpage within CGI script

Hi,

I had a similar problem a while back which was solved by disabling 
SELinux. http://www.crypt.gen.nz/selinux/disable_selinux.html


-Greg


Re: [CODE4LIB] calling another webpage within CGI script - solved!

2009-11-24 Thread Ross Singer
On Tue, Nov 24, 2009 at 11:18 AM, Graham Stewart
graham.stew...@utoronto.ca wrote:
 We run many Library / web / database applications on RedHat servers with
 SELinux enabled.  Sometimes it takes a bit of investigation and  horsing
 around but I haven't yet found a situation where it had to be disabled.
  setsebool and chcon can solve most problems and SELinux is an excellent
 enhancement to standard filesystem and ACL security.

Agreed that SELinux is useful but it is a tee-otal pain in the keister
if you're ignorantly working against it because you didn't actually
know it was there.

It's sort of the perfect embodiment between the disconnect between the
developer and the sysadmin.  And, if this sort of tension interests
you, vote for Bess Sadler's presentation at Code4lib 2010: Vampires
vs. Werewolves: Ending the War Between Developers and Sysadmins with
Puppet and anything else that interests you.

http://vote.code4lib.org/election/index/13

-Ross Bringin' it on home Singer.


Re: [CODE4LIB] calling another webpage within CGI script - solved!

2009-11-24 Thread Graham Stewart

An interesting topic ... heading out to cast vote now.

In our environment, about 6 years ago we informally identified the gap 
(grey area, war, however it is described) between server / network 
managers and developers / Librarians as an obstacle to our end goals and 
have put considerable effort into closing it.  The key efforts being 
communication (more planning, meetings, informal sessions), 
collaboration (no-one is working in a vacuum), and the willingness to 
expand/stretch job descriptions (programmers sometimes participate in 
hardware / OS work and sysadmins will attend interface / application 
planning meetings).  Supportive management helps.


The end result is that sysadmins try as hard as possible to fully 
understand what an application is doing/requires on their 
hardware/networks, and programmers almost never run any applications 
that sysadmins don't know about.


So, SELinux has never been a problem because we know what a server needs 
to do before it ends up in a developer's hands and developers know not 
to pound their heads against the desk for a day before talking to 
sysadmins about something that doesn't work.  Well, for the most part, 
anyway ;-)


-Graham

Ross Singer wrote:

On Tue, Nov 24, 2009 at 11:18 AM, Graham Stewart
graham.stew...@utoronto.ca wrote:

We run many Library / web / database applications on RedHat servers with
SELinux enabled.  Sometimes it takes a bit of investigation and  horsing
around but I haven't yet found a situation where it had to be disabled.
 setsebool and chcon can solve most problems and SELinux is an excellent
enhancement to standard filesystem and ACL security.


Agreed that SELinux is useful but it is a tee-otal pain in the keister
if you're ignorantly working against it because you didn't actually
know it was there.

It's sort of the perfect embodiment between the disconnect between the
developer and the sysadmin.  And, if this sort of tension interests
you, vote for Bess Sadler's presentation at Code4lib 2010: Vampires
vs. Werewolves: Ending the War Between Developers and Sysadmins with
Puppet and anything else that interests you.

http://vote.code4lib.org/election/index/13

-Ross Bringin' it on home Singer.


--
Graham Stewart
Network and Storage Services Manager, Information Technology Services
University of Toronto Library
130 St. George Street
Toronto, Ontariograham.stew...@utoronto.ca
Canada   M5S 1A5Phone: 416-978-6337 | Mobile: 416-550-2806 | 
Fax: 416-978-1668


[CODE4LIB] calling another webpage within CGI script

2009-11-23 Thread Ken Irwin
Hi all,

I'm moving to a new web server and struggling to get it configured properly. 
The problem of the moment: having a Perl CGI script call another web page in 
the background and make decisions based on its content. On the old server I 
used an antique Perl script called hcat (from the Pelican 
bookhttp://oreilly.com/openbook/webclient/ch04.html); I've also tried curl 
and LWP::Simple.

In all three cases, I get the same behavior: it works just fine on the command 
line, but when called by the web server through a CGI script, the LWP (or other 
socket connection) gets no results. It sounds like a permissions thing, but I 
don't know what kind of permissions setting to tinker with. In the test script 
below, my command line outputs:

Content-type: text/plain
Getting URL: http://www.npr.org
885 lines

Whereas the web output just says Getting URL: http://www.npr.org; - and 
doesn't even get to the Couldn't get error message.

Any clue how I can make use of a web page's contents from w/in a CGI script? 
(The actual application has to do with exporting data from our catalog, but I 
need to work out the basic mechanism first.)

Here's the script I'm using.

#!/bin/perl
use LWP::Simple;
print Content-type: text/plain\n\n;
my $url = http://www.npr.org;;
print Getting URL: $url\n;
my $content = get $url;
die Couldn't get $url unless defined $content;
@lines = split (/\n/, $content);
foreach (@lines) { $i++; }
print \n\n$i lines\n\n;

Any ideas?

Thanks
Ken


Re: [CODE4LIB] calling another webpage within CGI script

2009-11-23 Thread Vishwam Annam

Ken,

The difference is when you run through command script you are executing 
the file as /owner/ and as /Other/ when you access it through the 
browser. Looking at the error message you sent, I believe it might not 
be executing the complete script. Try setting permissions as 707 or 777 
to start with. You may have to create a temporary directory to test with.


Let me know if you have any questions,

Vishwam
Vishwam Annam
Wright State University Libraries
120 Paul Laurence Dunbar Library
3640 Colonel Glenn Hwy.
Dayton, OH 45435
Office: 937-775-3262
FAX 937-775-2356


Ken Irwin wrote:

Hi all,

I'm moving to a new web server and struggling to get it configured properly. The problem of the 
moment: having a Perl CGI script call another web page in the background and make decisions 
based on its content. On the old server I used an antique Perl script called hcat 
(from the Pelican bookhttp://oreilly.com/openbook/webclient/ch04.html); I've also tried 
curl and LWP::Simple.

In all three cases, I get the same behavior: it works just fine on the command 
line, but when called by the web server through a CGI script, the LWP (or other 
socket connection) gets no results. It sounds like a permissions thing, but I 
don't know what kind of permissions setting to tinker with. In the test script 
below, my command line outputs:

Content-type: text/plain
Getting URL: http://www.npr.org
885 lines

Whereas the web output just says Getting URL: http://www.npr.org; - and doesn't even get 
to the Couldn't get error message.

Any clue how I can make use of a web page's contents from w/in a CGI script? 
(The actual application has to do with exporting data from our catalog, but I 
need to work out the basic mechanism first.)

Here's the script I'm using.

#!/bin/perl
use LWP::Simple;
print Content-type: text/plain\n\n;
my $url = http://www.npr.org;;
print Getting URL: $url\n;
my $content = get $url;
die Couldn't get $url unless defined $content;
@lines = split (/\n/, $content);
foreach (@lines) { $i++; }
print \n\n$i lines\n\n;

Any ideas?

Thanks
Ken
  


Re: [CODE4LIB] calling another webpage within CGI script

2009-11-23 Thread Matt Jones
Hi Ken,

This may be obvious, but when running from the command line, stdout and
stderr are often interleaved together, but on the web server you see stdout
in the browser and stderr in the web server error log.  Your script is
probably exiting with an error either at the 'get' line (line 6) or at the
'die' line (line 7), which is what 'die' does -- terminate your script.
Have you checked your web server error log to see what the error is on your
'get' call?

Matt

On Mon, Nov 23, 2009 at 7:17 AM, Ken Irwin kir...@wittenberg.edu wrote:

 Hi all,

 I'm moving to a new web server and struggling to get it configured
 properly. The problem of the moment: having a Perl CGI script call another
 web page in the background and make decisions based on its content. On the
 old server I used an antique Perl script called hcat (from the Pelican
 bookhttp://oreilly.com/openbook/webclient/ch04.html); I've also tried
 curl and LWP::Simple.

 In all three cases, I get the same behavior: it works just fine on the
 command line, but when called by the web server through a CGI script, the
 LWP (or other socket connection) gets no results. It sounds like a
 permissions thing, but I don't know what kind of permissions setting to
 tinker with. In the test script below, my command line outputs:

 Content-type: text/plain
 Getting URL: http://www.npr.org
 885 lines

 Whereas the web output just says Getting URL: http://www.npr.org; - and
 doesn't even get to the Couldn't get error message.

 Any clue how I can make use of a web page's contents from w/in a CGI
 script? (The actual application has to do with exporting data from our
 catalog, but I need to work out the basic mechanism first.)

 Here's the script I'm using.

 #!/bin/perl
 use LWP::Simple;
 print Content-type: text/plain\n\n;
 my $url = http://www.npr.org;;
 print Getting URL: $url\n;
 my $content = get $url;
 die Couldn't get $url unless defined $content;
 @lines = split (/\n/, $content);
 foreach (@lines) { $i++; }
 print \n\n$i lines\n\n;

 Any ideas?

 Thanks
 Ken



Re: [CODE4LIB] calling another webpage within CGI script

2009-11-23 Thread Roy Tennant
Ken,
I tested your script on my server and it also worked for me on the command
line and failed via my web server. All I did was add /usr to your path to
perl and it worked:

#!/usr/bin/perl

Roy



On 11/23/09 11/23/09 € 8:17 AM, Ken Irwin kir...@wittenberg.edu wrote:

 Hi all,
 
 I'm moving to a new web server and struggling to get it configured properly.
 The problem of the moment: having a Perl CGI script call another web page in
 the background and make decisions based on its content. On the old server I
 used an antique Perl script called hcat (from the Pelican
 bookhttp://oreilly.com/openbook/webclient/ch04.html); I've also tried curl
 and LWP::Simple.
 
 In all three cases, I get the same behavior: it works just fine on the command
 line, but when called by the web server through a CGI script, the LWP (or
 other socket connection) gets no results. It sounds like a permissions thing,
 but I don't know what kind of permissions setting to tinker with. In the test
 script below, my command line outputs:
 
 Content-type: text/plain
 Getting URL: http://www.npr.org
 885 lines
 
 Whereas the web output just says Getting URL: http://www.npr.org; - and
 doesn't even get to the Couldn't get error message.
 
 Any clue how I can make use of a web page's contents from w/in a CGI script?
 (The actual application has to do with exporting data from our catalog, but I
 need to work out the basic mechanism first.)
 
 Here's the script I'm using.
 
 #!/bin/perl
 use LWP::Simple;
 print Content-type: text/plain\n\n;
 my $url = http://www.npr.org;;
 print Getting URL: $url\n;
 my $content = get $url;
 die Couldn't get $url unless defined $content;
 @lines = split (/\n/, $content);
 foreach (@lines) { $i++; }
 print \n\n$i lines\n\n;
 
 Any ideas?
 
 Thanks
 Ken
 


Re: [CODE4LIB] calling another webpage within CGI script

2009-11-23 Thread Joe Hourcle

On Mon, 23 Nov 2009, Ken Irwin wrote:


Hi all,

I'm moving to a new web server and struggling to get it configured properly. The problem of the 
moment: having a Perl CGI script call another web page in the background and make decisions 
based on its content. On the old server I used an antique Perl script called hcat 
(from the Pelican bookhttp://oreilly.com/openbook/webclient/ch04.html); I've also tried 
curl and LWP::Simple.

In all three cases, I get the same behavior: it works just fine on the command 
line, but when called by the web server through a CGI script, the LWP (or other 
socket connection) gets no results. It sounds like a permissions thing, but I 
don't know what kind of permissions setting to tinker with. In the test script 
below, my command line outputs:

Content-type: text/plain
Getting URL: http://www.npr.org
885 lines

Whereas the web output just says Getting URL: http://www.npr.org; - and doesn't even get 
to the Couldn't get error message.

Any clue how I can make use of a web page's contents from w/in a CGI script? 
(The actual application has to do with exporting data from our catalog, but I 
need to work out the basic mechanism first.)

Here's the script I'm using.

#!/bin/perl
use LWP::Simple;
print Content-type: text/plain\n\n;
my $url = http://www.npr.org;;
print Getting URL: $url\n;
my $content = get $url;
die Couldn't get $url unless defined $content;
@lines = split (/\n/, $content);
foreach (@lines) { $i++; }
print \n\n$i lines\n\n;

Any ideas?


I'd suggest testing the results of the call, rather than just looking for 
content, as an empty response could be a result of the server you're 
connecting to.  (unlikely in this case, but it happens once in a while, 
particularly if you turn off redirection, or support caching). 
Unfortunately, you might have to use LWP::UserAgent, rather than 
LWP::Simple:


#!/bin/perl --

use strict; use warnings;
use LWP::UserAgent;

my $ua = LWP::UserAgent-new( timeout = 60 );

my $response = $ua-get('http://www.npr.org/');
if ( $response-is_success() ) {
my $content = $response-decoded_content();
...
} else {
print HTTP Error : ,$response-status_line(),\n;
}

__END__

(and changing the shebang line for my location of perl, your version 
worked via both CGI and command line)



oh ... and you don't need the foreach loop:

my $i = @lines;

-Joe