from:"Bill Moseley"

Re: MaxRequestsPerChild; which request am I?

2003-04-04 Thread Bill Moseley

On Fri, 4 Apr 2003, Brian Reichert wrote:

   In messing with Apache 1.x, is there a way, via mod-perl, of a
   request knowing how many requests have been served by the current
   child?
  
  
  $request++;
  
  That's what I do in some handler, and then I log it along with the PID.
 
 Eh?  I'm confused.  What is '$request' in that example?  If you
 mean it's the request object, then that doesn't do what I expect.

No, it's a simple counter.  It's just a variable in some module that
counts requests.






-- 
Bill Moseley [EMAIL PROTECTED]

Re: MaxRequestsPerChild; which request am I?

2003-04-03 Thread Bill Moseley

On Fri, 4 Apr 2003, Brian Reichert wrote:

 Dunno if someone has a good answer, or a suggestion of a better
 forum for this:
 
 Apache has a configuration directive: MaxRequestsPerChild
 
   http://httpd.apache.org/docs/mod/core.html#maxrequestsperchild
 
 In messing with Apache 1.x, is there a way, via mod-perl, of a
 request knowing how many requests have been served by the current
 child?


$request++;

That's what I do in some handler, and then I log it along with the PID.




-- 
Bill Moseley [EMAIL PROTECTED]

Re: Basic Auth logout

2003-03-07 Thread Bill Moseley

On Fri, 7 Mar 2003, Francesc Guasch wrote:

 this has been asked before, and I've found in the archives
 there is no way I could have a logout page for the Basic
 Auth in apache.
 
 Is there nothing I can do ? This is required only for the
 development team, so we need to let mozilla or IE  forget
 about the username and password.

It all depends on the browser and version.  I have been able to logout
some versions of IE by having a link to another protected resource of the
same auth name but different username and password (in the link).

You are just better maintaining a session on the server.

-- 
Bill Moseley [EMAIL PROTECTED]

Re: Authorization question

2003-02-27 Thread Bill Moseley

On Thu, 27 Feb 2003, Perrin Harkins wrote:

 Jean-Michel Hiver wrote:
  Yes, but you're then making the authorization layer inseparable from
  your applicative layer, and hence you loose the interest of using
  separate handlers.
 
 It's pretty hard to truly separate these things.  Nobody wants to use 
 basic auth, which means there is a need for forms and handlers.  Then 
 you have to keep that information in either cookies or URLs, and there 
 is usually a need to talk to an external data database with a 
 site-specific schema.  The result is that plug and play auth schemes 
 only work (unmodified) for the simplest sites.

Anyone using PubCookie?

http://www.washington.edu/pubcookie/

-- 
Bill Moseley [EMAIL PROTECTED]

Is Sys::Signal still needed?

2003-02-01 Thread Bill Moseley

Searching the archives I don't see much discusson of Sys::Signal.  Is it
still needed to restore sig handlers?

Thanks,


-- 
Bill Moseley [EMAIL PROTECTED]

Re: web link broken when access cgi-bin

2002-12-22 Thread Bill Moseley

On Sunday 22 December 2002 03:49, Ged Haywood wrote:
 Hi there,
 
 On Sat, 21 Dec 2002, eric lin wrote:
 
  The image file:///home/enduser/mytest.jpg cannot be displayed, because 
  it contains errors
 
 I think I understand your question but I am not sure of it.
 
 It seems that you have sent a request to Apache, received a response,

And sent messages about using Windows to a Linux list, and CGI questions to 
mod_perl list and seems to ignore the many requests to read some basic CGI 
tutorials.   I'd guess troll if he wasn't so clueless. ;)

Re: web link broken when access cgi-bin

2002-12-22 Thread Bill Moseley

On Sun, 22 Dec 2002, Richard Clarke wrote:

  And sent messages about using Windows to a Linux list, and CGI questions
 to
  mod_perl list and seems to ignore the many requests to read some basic CGI
  tutorials.   I'd guess troll if he wasn't so clueless. ;)
 
 Since when did mod_perl becomes Linux only?

oops, I meant to write:

And sent messages about using Windows to a Linux list


-- 
Bill Moseley [EMAIL PROTECTED]

Re: Fw: OT - Santa uses PERL

2002-12-20 Thread Bill Moseley

At 11:17 AM 12/20/02 +0200, Issac Goldstand wrote: 
>>>>
http://www.perl.com/pub/a/2002/12/18/hohoho.html>http://www.perl.com/pub/a/2002/12/18/hohoho.html 


That sounds a lot like Perrin's story.  Didn't he save Christmas one year?



-- 
Bill Moseley
mailto:[EMAIL PROTECTED]

Can't get nested files to work in Perl section

2002-12-19 Thread Bill Moseley

mod_perl 1.27 / httpd 1.3.27

In the perl httpd.conf below test.cgi is returned as the default type,
text/plain, where test2.cgi is run as a CGI script.

Do I have this setup incorrectly?

In a standard httpd.conf file it's allowable to have files nested within
directory, of course.

 Perl
 #!perl
 $User = 'nobody';
 $Group = 'users';
 $ServerRoot = '/home/moseley/test';
 $TypesConfig = '/dev/null';
 $Listen = '*:8000';

 $VirtualHost{'*:8000'} = {
ServerName   = 'foo',
DocumentRoot = '/home/moseley/test',
ErrorLog = 'logs/error_log.8000',
TransferLog  = 'logs/error_log.8000',

Files = {
'test2.cgi' = {
Options = '+ExecCGI',
SetHandler  = 'cgi-script',
},
},


Directory = {
'/home/moseley/test' = {
Allow   = 'from all',
Files = {
'test.cgi' = {
Options = '+ExecCGI',
SetHandler  = 'cgi-script',
},
},
},
},
 };
 

 __END__


-- 
Bill Moseley
mailto:[EMAIL PROTECTED]

[OT] Ideas for limiting form submissions

2002-12-18 Thread Bill Moseley

I've got a mod_perl feed-back form that sends mail to a specific address..
Spammers have their bots hitting the form now.  The tricks I know of are:

- generate a random image of numbers and make the user type in the numbers
on the form.  Painful for the user and spammers probably have OCR!

- require an email and send a confirmation email (like a list
subscription) and whitelist some email addresses.  But we want to allow
anonymous submissions.

- limit submissions by IP number to one every X minutes.  AOL users may
get blocked.

- md5 the submission and block duplicates (should do this anyway).  BTW --
what would you recommend for caching the md5 strings.  Cache::Cache or
DBM?  I suppose a Cache::Cache file cache would be the easiest.

Any other ideas on the easy to implement side?



-- 
Bill Moseley [EMAIL PROTECTED]

Re: [OT] Ideas for limiting form submissions

2002-12-18 Thread Bill Moseley

At 02:51 PM 12/18/02 -0500, Daniel Koch wrote:
Check out Gimpy, which I believe is what Yahoo uses:

http://www.captcha.net/captchas/gimpy/

I'm thinking of something along those lines.  This problem is this is on
Solaris 2.6 w/o root, and I'll bet it would take some time to get The Gimp
and GTK and whatever libs installed.

So, I'm thinking about creating a directory of say 20 images of words.  On
the initial request the form creates a random key, and makes that a symlink
to one of the images selected at random.  That will be the img src link.

Then md5 the symlink with a secret word to create a hidden field.

The submitter will have to type in the word displayed in the image.

On submit md5 all the symlinks with the secret word until a match is found
-- match the submitted word text with the real image name, then unlink the
symlink and accept the request.

Cron can remove old symlinks.

If the spammers put in the work to figure out the word by check-summing the
images I can use imagemagic to modify the images -- that could be a nice
mod_perl handler.

See any glaring holes? 

-- 
Bill Moseley
mailto:[EMAIL PROTECTED]

RE: Cookie-free authentication

2002-12-13 Thread Bill Moseley

On Sat, 14 Dec 2002, Ron Savage wrote:

 Under Apache V 1/Perl 5.6.0 I could not get the Apache::AuthCookieURL 
 option working which munged URLs without requiring cookies.

I thought the problem was that Apache::AuthCookie was redirecting to your
login scrip on logout instead of displaying your logout page.


-- 
Bill Moseley [EMAIL PROTECTED]

Re: Yahoo is moving to PHP ??

2002-10-30 Thread Bill Moseley

At 02:50 PM 10/30/02 -0500, Perrin Harkins wrote:
Mithun Bhattacharya wrote:

No it is not being removed but this could have been a very big thing
for mod_perl. Can someone find out more details as to why PHP was
preferred over mod_perl it cant be just on a whim.


Think about what they are using it for.  Yahoo is the most extreme 
example of a performance-driven situation.

I also wonder if it's cheaper/easier to hire and train PHP programmers that
Perl programmers.


-- 
Bill Moseley
mailto:moseley;hank.org

RE: [OTish] Version Control?

2002-10-30 Thread Bill Moseley

At 04:47 PM 10/30/02 -0500, Jesse Erlbaum wrote:
Web development projects can map very nicely into CVS.  We have a very
mature layout for all web projects.  In a nutshell, it boils down to this:

   project/
 + apache/
 + bin/

That requires binary compatibility, though.  I have a similar setup, but
the perl and Apache are built separately on the target machine since my
machines are linux and the production machine is Solaris.

I only work on single servers, so things are a bit easier.  I always cvs co
to a new directory on the production machine and start up a second set of
servers on high ports.  That lets me (and the client) test on the final
platform before going live.  Then it's apache stop  mv live old  mv
new live  apache start kind of thing, which is a fast way to update.

I'd love to have the Perl modules in cvs, though.  Especially mod_perl
modules.  It makes me nervous upgrading mod_perl on the live machine's perl
library.  Should make more use of PREFIX, I suppose.

Speaking of cvs, here's a thread branch:

I have some client admin features that they update via web forms -- some
small amount of content, templates, and text-based config settings.  I
currently log a history of changes, but it doesn't have all the features of
cvs.

Is anyone using cvs to manage updates made with web-based forms?



-- 
Bill Moseley
mailto:moseley;hank.org

Re: [OTish] Version Control?

2002-10-30 Thread Bill Moseley

At 03:21 PM 10/30/02 -0800, [EMAIL PROTECTED] wrote:
We check in all of our perl modules into CVS and its  a 
_MAJOR_ life saver. Keeps everyone on the same path so to 
speak.

I think I confused two different things: perl module source vs. installed
modules.  Do you check in the source or the installed modules?

I keep the source of my perl modules under cvs, but not the perl library
i.e. the files generated from make install, which might include binary
components.

I use a PREFIX for my own modules, but I tend to install CPAN modules in
the main perl library.  My own modules get installed in the application
directory tree so that there's still a top level directory for the entire
application/site.

It does worry me that I'll update a CPAN module (or Apache::*) in the main
Perl library and break something some day.  (Although on things like
updating mod_perl I have copied /usr/local/lib/perl5 before make install.)


-- 
Bill Moseley
mailto:moseley;hank.org

mod_perl-based registration programs?

2002-06-13 Thread Bill Moseley


Before I start rewriting...

Anyone know of a mod_perl based program for registering people for events?

The existing system allows people to sign up and cancel for classes and
workshops that are offered at various locations and also for on-line
classes.  We have a collection of training workshops that are each offered
a number of times a year, and are taught by a pool of instructors.
Typically a few classes a week.

It mails reminders a few days before the classes, sends class lists to the
assigned instructor before their class, and normal database stuff for
displaying, searching and reporting. Currently, billing is by invoice, but
we would like an on-line payment option.

Anyone know of something similar?

Thanks,

Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Logging under CGI

2002-06-10 Thread Bill Moseley


At 10:30 PM 06/10/02 -0400, Sam Tregar wrote:
On Tue, 11 Jun 2002, Sergey Rusakov wrote:

 open(ERRORLOG, '/var/log/my_log');
 print ERRORLOG some text\n;
 close ERRORLOG;

 This bit of code runs in every apache child.
 I worry abount concurent access to this log file under heavy apache
load. Is
 there any problems on my way?

You are correct to worry.  You should use flock() to prevent your log file
from becoming corrupted.  See perldoc -f flock() for more details.

Maybe it's a matter of volume.  Or size of string written to the log.  But
I don't flock, and I keep the log file open between requests and only
reopen if stat() shows that the file was renamed.  So far been lucky.


-- 
Bill Moseley
mailto:[EMAIL PROTECTED]

RE: [OT] MVC soup (was: separating C from V in MVC)

2002-06-08 Thread Bill Moseley


At 12:13 PM 06/08/02 +0100, Jeff wrote:
The responsibility of the Controller is to take all the supplied user
input, translate it into the correct format, and pass it to the Model,
and watch what happens. The Model will decide if the instruction can be
realised, or if the system should explode.

I'd like to ask a bit more specific question about this.  Really two
questions.  One about abstracting input, and, a bit mundane, building links
from data set in the model.

I've gone full circle on handling user input.  I used to try to abstract
CGI input data into some type of request object that was then passed onto
the models.  But then the code to create the request object ended up
needing to know too much about the model.

For example, say for a database query the controller can see that there's a
query parameter and thus knows to pass the request to the code that knows
how to query the database.  That code passes back a results object which
then the controller can look at to decide if it should display the results,
a no results page and/or the query form again.

Now, what happens is that features are added to the query code.  Let's say
we get a brilliant idea that search results should be shown a page at a
time (or did Amazon patent that?).  So now we want to pass in the query,
starting result, and the page size.

What I didn't like about this is I then had to adjust the so-called
controller code that decoded the user input for my request object to
include these new features.  But really that data was of only interest to
the model.  So a change in the model forced a change in the controller.

So now I just have been passing in an object which has a param() method
(which, lately I've been using a CGI object instead of an Apache::Request)
so the model can have full access to all the user input.  It bugs me a bit
because it feels like the model now has intimate access to the user input.

And for things like cron I just emulate the CGI environment.

So my question is: Is that a reasonable approach?

My second, reasonably unrelated question is this: I often need to make
links back to a page, such as a link for page next.  I like to build
links in the view, keeping the HTML out of the model if possible.  But for
something like a page next link that might contain a bunch of parameters
it would seem best to build href in the model that knows about all those
parameters.

Anyone have a good way of dealing with this?

Thanks,

P.S. and thanks for the discussion so far.  It's been very interesting.


-- 
Bill Moseley
mailto:[EMAIL PROTECTED]

[OT] MVC soup (was: separating C from V in MVC)

2002-06-06 Thread Bill Moseley


I, like many, find these discussion really interesting.  I always wish
there was some write up for the mod_perl site when all was said and done.
But I guess one of the reasons it's so interesting is that there's more
than one correct point of view.

My MVC efforts often fall apart in the C an M separation.  My M parts end
up knowing too much about each other -- typically because of error
conditions e.g. data that's passed to an M that does not validate.  And I
don't want to validate too much data in the C as the C ends up doing M's work.

Anyone have links to examples of MVC Perl code (mostly controller code)
that does a good job of M and C separation, and good ways to propagate
errors back to the C?  


-- 
Bill Moseley
mailto:[EMAIL PROTECTED]

Throttling, once again

2002-04-18 Thread Bill Moseley


Hi,

Wasn't there just a thread on throttling a few weeks ago?

I had a machine hit hard yesterday with a spider that ignored robots.txt.  

Load average was over 90 on a dual CPU Enterprise 3500 running Solaris 2.6.
 It's a mod_perl server, but has a few CGI scripts that it handles, and the
spider was hitting one of the CGI scripts over and over.  They were valid
requests, but coming in faster than they were going out.

Under normal usage the CGI scripts are only accessed a few times a day, so
it's not much of a problem have them served by mod_perl.  And under normal
peak loads RAM is not a problem.  

The machine also has bandwidth limitation (packet shaper is used to share
the bandwidth).  That combined with the spider didn't help things.  Luckily
there's 4GB so even at a load average of 90 it wasn't really swapping much.
 (Well not when I caught it, anyway).  This spider was using the same IP
for all requests.

Anyway, I remember Randal's Stonehenge::Throttle discussed not too long
ago.  That seems to address this kind of problem.  Is there anything else
to look into?  Since the front-end is mod_perl, it mean I can use mod_perl
throttling solution, too, which is cool.

I realize there's some fundamental hardware issues to solve, but if I can
just keep the spiders from flooding the machine then the machine is getting
by ok.

Also, does anyone have suggestions for testing once throttling is in place?
 I don't want to start cutting off the good customers, but I do want to get
an idea how it acts under load.  ab to the rescue, I suppose.

Thanks much,


-- 
Bill Moseley
mailto:[EMAIL PROTECTED]

RE: mod_perl Cook Book

2002-04-06 Thread Bill Moseley


At 09:59 AM 04/06/02 +0100, Phil Dobbin wrote:

It's definitely the book to buy _before_ the Eagle book.

No, buy both at the same time.  I think the Eagle gives a really good
foundation, and it's very enjoyable reading (regardless of what my wife
says!).

I still think the Eagle book is one of the best books on my bookshelf.  I
have a couple of Apache-specific books and I learned a lot more about
Apache from the Eagle than those.  The cook book has been a great addition.


-- 
Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Creating a proxy using mod_perl

2002-03-15 Thread Bill Moseley


At 05:11 PM 3/15/2002 +0300, Igor Sysoev wrote:
On Fri, 15 Mar 2002, Marius Kjeldahl wrote:

 I guess these all suffer from the fact that the parameters have to be 
 specified in httpd.conf, which makes it impossible to pass a url to 
 fetch from in a parameter, right?

So mod_rewite with mod_proxy or mod_accel:

RewriteRule   /proxy_url=http://(.+)$http://$1   [L,P]

Note that 'proxy?url=' is changed to 'proxy_url='.

Any concern about being an open proxy there?  I'd want to only proxy the
sites I'm working with.  

I'd rather cache the images locally, just in case you are working with a
slow site or if they do something silly like check referer on requests.



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: [ANNOUNCE] The New mod_perl logo - results now in...

2002-03-15 Thread Bill Moseley


At 04:33 PM 03/15/02 -0500, Georgy Vladimirov wrote:
I actually like the logo without the underscore. I don't think an
underscore is very collaborative with art. The _ has always been
irritating me a little.

I know that there is history and nostalgia involved here but dropping
an underscore at least in the logo is a nice evolution IMHO.

I also agree with this, and is one of the reasons (I think) I voted for
that design.  It's a graphic design so I don't see that it needs to follow
the Apache module naming convention exactly.  Nor perl identifier names,
either.  Many of the designs offered didn't use the underscore as well.
And the design that won didn't use one.  It's a design -- it doesn't have
to be accurate to the name.

Besides, if it changes does it mean that the winning design received no
votes? ;)


-- 
Bill Moseley
mailto:[EMAIL PROTECTED]

[WOT] Google Programming Contest.

2002-02-07 Thread Bill Moseley


Sorry for the Way Off Topic, and sorry if I missed this on the list already:

http://www.google.com/programming-contest/

They say C++ or Java.  What, no Perl?


-- 
Bill Moseley
mailto:[EMAIL PROTECTED]

Re: New mod_perl Logo

2002-01-29 Thread Bill Moseley


At 07:29 PM 01/29/02 -0500, Chris Thompson wrote:
Well, I'd like to just throw one idea into the mix. It's something that's
bugged me for a long time, no better time than the present.

mod_perl is a lousy name.

I don't know about lousy, but I do agree.   I brought this up on the
docs-dev list:

  http://search.apache.org/archives/docs-dev/0236.html

During the week I posted that I had run into PHP programmers at a computer
show, more PHP programmers at a pub (2 in the afternoon -- more out of work
programmers), and ended up talking with a couple of Java programmers one
day.  The amazing thing was they all had a completely weird idea about what
mod_perl is or what it does.  And all thought it was slow, old, dead, not
scalable,  technology.  And that was from programmers, not managers.  We
all know there is a lot of misinformation out there.

Marketing is not everything, but it's a lot!  What we know of mod_perl is
more than just perl+Apache, really.  It's a development platform, or
development suite.  It can be anything our marketing department says it is. ;)

In these tough economic times, repackaging might be helpful.  Who knows?

And for some of us we know that mod_perl is also something that makes up a
chunk of our livelihood.  So, the promotion of mod_perl is quite important,
unless we want to start spending more afternoons with those PHP programmers
down at the corner pub.

So how would a group like the mod_perl community promote itself in new
ways?  Well, other professionals often have professional organizations or
associations to represent and promote their members.  I wonder if there are
there enough mod_perl programmers to support something like that.  Even if
there were, what could be done?  Run a few print ads in magazines that
system admins read?  Hire an ad firm for help in developing our brand?
mod_perl coffee mugs? (Tired of that old cup of Java?)  Free mod_perl
clinics?  Hard to imagine any of that actually happening, really.

So what's a group of programmers to do?

The new web site should help, to some degree, but I'm not sure it will
change any manager's mind on the technology they pick to run their
applications.

Of course, most people here have access to big pipes.  So, there's always
bulk mail ads.  I got mail just today saying that it's an effective way to
advertise.  In fact I got about ten of those today!


-- 
Bill Moseley
mailto:[EMAIL PROTECTED]

Re: META tags added as HTTP headers

2002-01-18 Thread Bill Moseley



At 01:20 AM 01/19/02 +0100, Markus Wichitill wrote:
which part of an Apache/mod_perl setup is responsible for extracting META
tags from generated HTML and adding them as HTTP headers (even with
PerlSendHeaders Off)?

That's lwp doing that, not Apache or mod_perl.

 HEAD http://www.apache.org
200 OK
Cache-Control: max-age=86400
Connection: close
Date: Sat, 19 Jan 2002 00:27:10 GMT
Accept-Ranges: bytes
Server: Apache/2.0.28 (Unix)
Content-Length: 7810
Content-Type: text/html
Expires: Sun, 20 Jan 2002 00:27:10 GMT
Client-Date: Sat, 19 Jan 2002 00:27:17 GMT
Client-Request-Num: 1
Client-Warning: LWP HTTP/1.1 support is experimental


-- 
Bill Moseley
mailto:[EMAIL PROTECTED]

Re: META tags added as HTTP headers

2002-01-18 Thread Bill Moseley


At 04:46 PM 01/18/02 -0800, ___cliff rayman___ wrote:
hmmm - you are still using lwp.

Right.  But lwp-request sends a GET request where HEAD sends, well, a HEAD
request.  So, even though LWP's default is to parse the head section,
there's no content to parse in a HEAD request, and thus the meta headers
don't show up.


-- 
Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Alarms?

2002-01-10 Thread Bill Moseley


At 06:56 PM 01/10/02 +0300, [EMAIL PROTECTED] wrote:
Hello!

I'm getting lots of errors in log:

[Thu Jan 10 18:54:33 2002] [notice] child pid 8532 exit signal Alarm clock
(14)

I hope I remember this correctly:

What's happening is you are setting a SIGALRM handler in perl, but perl is not 
correctly restoring Apache's handler when yours goes out of scope.

So then a non-mod_perl request times out there's not handler and the process is killed.

Check:
http://thingy.kcilink.com/modperlguide/debug/Debugging_Signal_Handlers_SIG_.html

http://thingy.kcilink.com/modperlguide/debug/Handling_Server_Timeout_Cases_an.html
-- 
Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Template-Toolkit performance tuning

2001-12-30 Thread Bill Moseley


At 05:17 PM 12/30/01 -0600, Ryan Thompson wrote:
   use Template;

   my %vars;

   $var{foo} = bar;   # About 30 scalars like this
   .
   .

   my $tt = new Template({INTERPOLATE = 1});

Cache your template object between requests.


-- 
Bill Moseley
mailto:[EMAIL PROTECTED]

Searchable archives (was: [modperl site design challenge] and the winner is...)

2001-12-26 Thread Bill Moseley


At 02:13 PM 12/24/01 +0800, Stas Bekman wrote:
FWIW, we are having what seems to be a very productive discussion at 
docs-dev mailing list. Unfortunately no mail archiver seem to pick this 
list up, so only the mbox files are available:
http://perl.apache.org/mail/docs-dev/

Is anyone up to make the searchable archives available? We have a bunch 
of lists that aren't browsable/searchable :(
http://perl.apache.org/#maillists

Hi Stas,

Any reason to not use hypermail?  Do you have mbox files for all the lists
in question?

I could setup searchable archives like this example, if you like.

   http://search.apache.org/docs-dev/  (this URL is temporary!)



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Can I use mod_perl to pass authentication details to apache from an HTML form?

2001-12-24 Thread Bill Moseley


At 08:49 AM 12/24/2001 -, Chris Thompson wrote:
I would like to set up a password-protected area within my website where my
web design clients can preview changes to their sites before they go live.
Although I know how to password protect directories using .htaccess files, I
would prefer to bypass the standard grey Authorization pop-up screen and
instead let users enter their username / password details through an HTML
form (which I think would look more professional).

Take a look at Apache::AuthCookie.

If possible, the system for authenticating / authorizing the user would also
redirect them to the appropriate directory for their site.

You can do that, or you can just have them go directly to their area, and
have the authentication system intercept the request.  This is what
Apache::AuthCookie does.  You might also look at Apache::AuthCookieURL if
there's a chance that your users might not have cookies enabled.



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: [modperl site design challenge] and the winner is...

2001-12-19 Thread Bill Moseley



I'm throwing in my two cents a bit late, so it's a bit depreciated now (one
cent?).  But something to think about for the site.

I've worked with php a little lately -- not programming, but making minor
changes to a site.  I've used the php site http://www.php.net/ a few times,
and I've found it reasonably functional, but also quite easy for someone
new to php.  Maybe it seems that way because I know nothing about php and
it's geared toward my level.  But that's good.  How often to the mod_perl
pros need to read the mod_perl home page?

I'm sure all these elements will be added to the new mod_perl site in some
way, but I just wanted to note what I liked about the php site.  And I'm
not comparing mod_perl to php!

What the php site shows in a real obvious way is:

1) what is php (for someone that is brand new) with a link to some basic
examples.  It demystifies php in a hurry.  Makes someone think Oh, I can
do that.

2) currently, it's showing Netcraft's usage stats, so I see that people are
using it in growing numbers -- it's not a dead-end for a new person to try
out.

3) it shows upcoming events.  That shows that there's a real support group
of real people to work with.  Links to discussion lists archives would be
good there.

All that makes it really easy for someone new to feel comfortable.

It would be nice to see license info, too, as someone new might want to be
clear on that right away, too.

You can also quickly see a list of supported modules.  This shows that it's
easy to extend, but also allows someone to see that it can do the thing
*they* might be interested in.  Sure, perl has CPAN, but I think it would
be good to show a list of commonly used modules for mod_perl, and what they
do, in a simple list.  If someone is just learning about mod_perl (or php)
the list doesn't need to be that big, as their needs will be reasonably basic.

Existing mod_perl (or php?) programmers might not like all that basic,
first-time user stuff right on the home page, and would rather have a more
functional site.  I don't know about anyone else, but I've got the links
I need bookmarked, and if not I go to perl.apache.org and ^F right to where
I want to go.

BTW -- At first I liked David's idea of using the ASF look.  That ties
mod_perl to apache well.  But, if the site is intended to bring in new
users, it might be good to be a bit more flashy.

crazy idea
Maybe as a community (of programmers not designers) we could hire a
professional designer to help develop our brand.  Cool web site.  Some
print ads in the trades.  What's a small amount in dues to the Association
of Mod_perl Programmers compared to increase of mod_perl work overall?
/crazy idea


Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Comparison of different caching schemes

2001-12-18 Thread Bill Moseley

Ok, I'm a bit slow...

At 03:05 PM 12/12/01 +1100, Rob Mueller (fastmail) wrote: 
>>>>
Just thought people might be interested...

Seems like they were!  Thanks again.

I didn't see anyone comment about this, but I was a bit surprised by MySQLs good performance.  I suppose caching is key.  I wonder if things would change with 50 or 100 thousand rows in the table.

I always assumed something like Cache::FileCache would have less overhead than a RDMS.  It's impressive.


>>>>
Now to the results, here they are.
Package C0 - In process hash
Sets per sec = 147116
Gets per sec = 81597
Mixes per sec = 124120
Package C1 - Storable freeze/thaw
Sets per sec = 2665
Gets per sec = 6653
Mixes per sec = 3880
Package C2 - Cache::Mmap
Sets per sec = 809
Gets per sec = 3235
Mixes per sec = 1261
Package C3 - Cache::FileCache
Sets per sec = 393
Gets per sec = 831
Mixes per sec = 401
Package C4 - DBI with freeze/thaw
Sets per sec = 651
Gets per sec = 1648
Mixes per sec = 816
Package C5 - DBI (use updates with dup) with freeze/thaw
Sets per sec = 657
Gets per sec = 1994
Mixes per sec = 944
Package C6 - MLDBM::Sync::SDBM_File
Sets per sec = 334
Gets per sec = 1279
Mixes per sec = 524
Package C7 - Cache::SharedMemoryCache
Sets per sec = 42
Gets per sec = 29
Mixes per sec = 32





Bill Moseley
mailto:[EMAIL PROTECTED]

Re: [RFC] Apache::CacheContent - Caching PerlFixupHandler

2001-12-06 Thread Bill Moseley


At 08:19 AM 12/06/01 -0800, Paul Lindner wrote:

Ok, hit me over the head.  Why wouldn't you want to use a caching proxy?

BTW -- I think where the docs are cached should be configurable.  I don't
like the idea of the document root writable by the web process.



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: [RFC] Apache::CacheContent - Caching PerlFixupHandler

2001-12-06 Thread Bill Moseley


At 10:33 AM 12/06/01 -0800, Paul Lindner wrote:
On Thu, Dec 06, 2001 at 10:04:26AM -0800, Bill Moseley wrote:
 At 08:19 AM 12/06/01 -0800, Paul Lindner wrote:
 
 Ok, hit me over the head.  Why wouldn't you want to use a caching proxy?

Apache::CacheContent gives you more control over the caching process
and keeps the expiration headers from leaking to the browser.

Ok, I see.

Or maybe you want to dynamically control the TTL?

Would you still use it with a front-end lightweight server?  Even with
caching, a mod_perl server is still used to send the cached file (possibly
over 56K modem), right?



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Hi

2001-12-04 Thread Bill Moseley


At 05:13 PM 12/04/01 -0500, Robert Landrum wrote:
If this guy is going to be sending us shit all night, I suggest we 
deactivate his account.

Now that would be fun!  Oh, you mean by unsubscribing him.  I was thinking
of something more sporting.  What's the collective bandwidth of the people
on this list?

Just kidding.




Bill Moseley
mailto:[EMAIL PROTECTED]

Re: [OT] log analyzing programs

2001-12-02 Thread Bill Moseley


At 10:09 AM 12/2/2001 +, Matt Sergeant wrote:

   PID USERNAME THR PRI NICE  SIZE   RES STATE   TIMECPU COMMAND
 17223 operator   1  442  747M  745M cpu14  19.2H 45.24% wusage

Ouch. Try analog.

  PID USERNAME THR PRI NICE  SIZE   RES STATE   TIMECPU COMMAND
17223 operator   1   02  747M  745M cpu14  27.1H 47.57% wusage

Well at least after another 8 hours of CPU it's not leaking ;)



Bill Moseley
mailto:[EMAIL PROTECTED]

[OT] log analyzing programs

2001-12-01 Thread Bill Moseley


Any suggestions for favorite ones?  wusage seems to require a lot of
resources -- maybe that's not unusual?  It runs once a week.  Here's a
about six days worth of requests.  Doesn't see like that many.

%wc -l access_log
 1185619 access_log

  PID USERNAME THR PRI NICE  SIZE   RES STATE   TIMECPU COMMAND
17223 operator   1  442  747M  745M cpu14  19.2H 45.24% wusage



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: [OT] Re: search.cpan.org

2001-11-27 Thread Bill Moseley


At 12:55 PM 11/27/01 -0800, Nick Tonkin wrote:

Because it does a full text search of all the contents of the DB.

Perhaps, but it's just overloaded.

I'm sure he's working on it, but anyone want of offer Graham free hosting?
A few mirrors would be nice, too.

(Plus, all my CPAN.pm setups are now failing to work, too)



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: [OT] Re: search.cpan.org

2001-11-27 Thread Bill Moseley


At 09:02 PM 11/27/01 +, Mark Maunder wrote:
I'm using it on our site and searching fulltext
indexes on three fields (including a large text field) in under 3 seconds
on over
70,000 records on a p550 with 490 megs of ram.


Hi Mark,

plug

Some day if you are bored, try indexing with swish-e (the development
version).
http://swish-e.org

The big problem with it right now is it doesn't do incremental indexing.
One of the developers is trying to get that working with in a few weeks.
But for most small sets of files it's not an issue since indexing is so fast.

My favorite feature is it can run an external program, such as a perl mbox
or html parser or perl spider, or DBI program or whatever to get the source
to index.  Use it with Cache::Cache and mod_perl and it's nice and fast
from page to page of results.

Here's indexing only 24,000 files:

 ./swish-e -c u -i /usr/doc
Indexing Data Source: File-System
Indexing /usr/doc
270279 unique words indexed.
4 properties sorted.  
23840 files indexed.  177638538 total bytes.
Elapsed time: 00:03:50 CPU time: 00:03:16
Indexing done!

Here's searching:

 ./swish-e -w install -m 1
# SWISH format: 2.1-dev-24
# Search words: install
# Number of hits: 2202
# Search time: 0.006 seconds
# Run time: 0.011 seconds

A phrase:

 ./swish-e -w 'public license' -m 1
# SWISH format: 2.1-dev-24
# Search words: public license
# Number of hits: 348
# Search time: 0.007 seconds
# Run time: 0.012 seconds
998 /usr/doc/packages/ijb/gpl.html gpl.html 26002


A wild card and boolean search:

 ./swish-e -w 'sa* or java' -m 1
# SWISH format: 2.1-dev-24
# Search words: sa* or java
# Number of hits: 7476
# Search time: 0.082 seconds
# Run time: 0.087 seconds

Or a good number of results:

 ./swish-e -w 'is or und or run' -m 1
# SWISH format: 2.1-dev-24
# Search words: is or und or run
# Number of hits: 14477
# Search time: 0.084 seconds
# Run time: 0.089 seconds

Or everything:

 ./swish-e -w 'not dksksks' -m 1
# SWISH format: 2.1-dev-24
# Search words: not dksksks
# Number of hits: 23840
# Search time: 0.069 seconds
# Run time: 0.074 seconds


This is pushing the limit for little old swish, but here's indexing a few
more very small xml files (~150 bytes each)

3830016 files indexed.  582898349 total bytes.
Elapsed time: 00:48:22 CPU time: 00:44:01

/plug

Bill Moseley
mailto:[EMAIL PROTECTED]

Re: [modperl-site design challenge]

2001-11-26 Thread Bill Moseley


At 11:14 AM 11/26/01 -0500, John Saylor wrote:
 * While the design might not be to cool from the designers point of view, I
 like it because it is simple, doesn't use HTML-tables, is small and fast
 (/very/ little HTML-overhead) and accessible to disabled people.

But that *is* cool. I think it's very well designed. To me, usability is
the main design goal.  Keep up the good work!

Does it need to render well in old browsers?  (e.g. netscape 4.08)

There's a lot of old browsers out there, but maybe anyone looking at
mod_perl would be a bit more up to date...



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Apache::Registry HEAD request also return document body

2001-11-23 Thread Bill Moseley


At 11:43 AM 11/23/2001 +, Jean-Michel Hiver wrote:

PROBLEM HERE
A head request should * NOT * return the body of the document

You should check $r-header_only in your handler. 

http://thingy.kcilink.com/modperlguide/correct_headers/3_1_HEAD.html



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Apache::AuthCookie login faliure reason

2001-11-23 Thread Bill Moseley


At 04:09 PM 11/23/2001 +1100, simran wrote: 

Hi All, 
  
I am having some trouble getting Apache::AuthCookie (version 3 which i
believe is the latest version) to do what want:
  
What i want is: 
  
* To be able to give the user a reson if login fails
  - eg reason: * No such username
* Your password was incorrect
  
Has anyone else come across the same requirement/issue, and how have you
solved it? 


Apache::AuthCookieURL does that.  IIRC, it sets a cookie with the failure
reason that's returned from authen_cred call.








Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Apache::Registry HEAD request also return document body

2001-11-23 Thread Bill Moseley


At 02:53 PM 11/23/01 +, Jean-Michel Hiver wrote:
  My only concern is that I thought that Apache::Registry was designed
  to act as a CGI emulator, allowing not so badly written CGIs to have
  mod_perl benefits without having to change them.

Right, sorry I completely missed the Registry part!

Try HEAD on this script.

#!/usr/local/bin/perl -w
use CGI;

my $q = CGI-new;

print $q-header, $q-start_html,
  join( BR\n, map { $_ : $ENV{$_} } keys %ENV),
  $q-end_html;



  If I have to use the $r object (and therefore the Apache module), then
  it means that the scripts won't be able to run as standalone CGIs...

Am I right?

Right,  maybe that's a good thing ;) (I acutally mix mod_perl code in
applicatins that will run under both.)



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Apache::Registry HEAD request also return document body

2001-11-23 Thread Bill Moseley


At 03:21 PM 11/23/01 +, Jean-Michel Hiver wrote:
Duh... That's a lot of info for a head request :-)

Yes, and that's what I get for using HEAD to test!  Yesterday's holliday
doesn't help todays thinking.

How about patching Apache::Registry?

Oh, Stas, of course, just posted a better solution.  Maybe I'll have better
luck repairing my car today.



Bill Moseley
mailto:[EMAIL PROTECTED]

[OT] Re: Seeking Legal help

2001-11-22 Thread Bill Moseley


At 03:21 PM 11/21/01 -0800, Medi Montaseri wrote:
I did some work (about $25000 worth) for a customer and I'm having
problem collecting. 

This has been beaten to death on the list, but... (and I'm not a lawyer,
but I drink beer with one),

If you think they are going Chapter 11, then you may want to try to bargain
down to some amount to get something, so you are not on their list of
creditors.  

When they do file, if that's the case, they have to notify the court of
their creditors and then the court is suppose to notify you.  You must then
file a proof of claim, and get in line with everyone else.  If you think
they might fail to list you as a creditor when they file, contact the court
every few weeks and check if they have already filed, and file your proof
of claim.  Then at least you might get a penny on the dollar...

$25K is a bad number, in that it's too big for small claims court, and it's
too little to get much help from lawyers in a law suit, I'd guess.  Ask
them if they want to pay partially in hardware and you might get a good
idea of their direction ;).

Good luck,



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Cookie authentication

2001-11-15 Thread Bill Moseley

At 02:02 PM 11/15/01 -0600, John Michael wrote: 
>>>>
This may seem off subject but, If you bare with me, I don't think it is.  I am interested in using the cookie based system referred to in the programming the apache api book but oftend wonder this.
Can you count on everyone to use cookies.


Sometime back I wrote a module based on Apache::AuthCookie called Apache::AuthCookieURL that uses cookies, or falls back to munged URLs if cookies were not enabled.  It's on CPAN.

I wrote it for a site where people come in from public libraries.  The requirement was that it had to do sessions even if cookies were disabled (as it was common for the public libraries to have cookies disabled). 

It's been a while since I looked at it.  I had added a way to disable the authen requirement for areas of the site (or everywhere), so it could be used just for dealing with sessions.

Do be careful about session hijacking.





Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Cookie authentication

2001-11-15 Thread Bill Moseley


At 05:20 PM 11/15/01 -0600, John Michael wrote:
Thanks.
I did not know that you could verify that someone has cookies turned on.
Can you point me to where i can find out how to do this?  Is there a
variable that you can check?

You set a cookie and do a redirect (if you need the cookie right away).  If
it comes back with a cookie then they are enabled.



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: how to install the XML::LibXSLT along with libxslt?

2001-11-14 Thread Bill Moseley

At 08:03 PM 11/14/01 -0800, SubbaReddy M wrote: 

Maybe a question for the libxml2 list instead of mod_perl?

So, while installing libxslt-1.0.6 
i am getting error atlast, that is  " checking for libxml libraries >= 2.4.7... ./configure: xml2-config: command not found "

Did you make install libxml2?

> which xml2-config
/usr/local/bin/xml2-config
>>>>

Bill Moseley
mailto:[EMAIL PROTECTED]

[OT] Data store options

2001-11-08 Thread Bill Moseley


Hi,

verbose
I'm looking for a little discussion on selecting a data storage method, and
I'm posting here because Cache::Cache often is discussed here (along with
Apache::Session).  And people here are smart, of course ;).

Basically, I'm trying to understand when to use Cache::Cache, vs. Berkeley
DB, and locking issues.  (Perrin, I've been curious why at etoys you used
Berkeley DB over other caching options, such as Cache::Cache).  I think
RDBMS is not required as I'm only reading/writing and not doing any kind of
selects on the data -- also I could end up doing thousands of selects for a
request.  So far, performance has been good with the file system store.

My specifics are that I have a need to permanently store tens of thousands
of smallish (5K) items.  I'm currently using a simple file system store,
one file per record, all in the same directory.  Clearly, I need to move
into a directory tree for better performance as the number of files increases.

The data is accessed in a few ways:

1) Read/write a single record
2) Read anywhere from a few to thousands of records in a request. This
   is the typical mod_perl-based request.  I know the record IDs that I
   need to read from another source.  I basically need a way to get some
   subset of records fast, by record ID.
3) Traverse the data store and read every record.

I don't need features to automatically expire the records.  They are
permanent.

When reading (item 2) I have to create a perl data structure from the data,
which doesn't change.  So, I want to store this in my record, using
Storable.pm.  That can work with any data store, of course.

It's not a complicated design.  My choices are something like:

1) use Storable and write the files out myself.
2) use Cache::FileCache and have the work done (but can I traverse?)
3) use Berkeley DB (I understand the issues discussed in The Guide)

So, what kind of questions and answers would help be weigh the options?

With regard to locking, IIRC, Cache::Cache doesn't lock, rather writes go
to a temp file, then there's an atomic rename.  Last in wins.  If updates
to a record are not based on previous content (such as a counter file) is
there any reason this is not a perfectly good method -- as opposed to flock?

Again, I'm really looking more for discussion, not an answer to my specific
needs.  What issues would you use when selecting a data store method, and why?

/verbose

Thanks very much,





Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Cache::* and MD5 collisions [was: [OT] Data store options]

2001-11-08 Thread Bill Moseley


At 10:54 AM 11/08/01 -0800, Andrew Ho wrote:
For example, say your keys are e-mail addresses and you just want to use
an MD5 hash to spread your data files over directories so that no one
directory has too many files in it. Say your original key is
[EMAIL PROTECTED] (hex encoded MD5 hash of this is RfbmPiuRLyPGGt3oHBagt).
Instead of just storing the key in the file
R/Rf/Rfb/Rfbm/RfbmPiuRLyPGGt3oHBagt.dat, store the key in the file
[EMAIL PROTECTED] Presto... collisions are impossible.

That has the nice side effect that I can run through the directory tree and
get the key for every file.  I do need a way to read every key in the
store.  Order is not important.



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: [OT] search engine module?

2001-10-16 Thread Bill Moseley


At 02:04 PM 10/16/2001 +0100, Ged Haywood wrote:
  Plus lots of other stuff like Glimpse and Swish which interface to
C-based
  engines.
 
 I've had good luck with http://swish-e.org/2.2/

Please make sure that it's possible to do a plain ordinary literal
text string search.  Nothing fancy, no case-folding, no automatic
removal of puctuation, nothing like that.  Just a literal string.

Last night I tried to find perl -V on all the search engines
mentioned on the mod_perl home page and they all failed in various
interesting ways.

I assume it's how the search engine is configured.  Swish, for example, you
can define what chars make up a word.  Not sure what you mean by literal
string.  For performance reasons you can't just grep words (or parts of
words), so you have to extract out words from the text during indexing.
You might define that a dash is ok at the start of a word, but not at the
end and to ignore trailing dots, so you could find -V and -V. (at the end
of a sentence).

Some search engines let you define a set of buzzwords that should be
indexed as-is, but that's more helpful for technical writing instead of
indexing code.

Finally, in swish, if you put something like perl -V in quotes to use a
phrase search it will find what you are looking for most likely, even if
the dash is not indexed.



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Mod_perl component based architecture

2001-10-16 Thread Bill Moseley


I've been looking at OpenInteract, too.  I've got a project where about 100
people need to edit records in a database via a web-based interface.  And
I'd like history tracking of changes (something like CVS provides, where
it's easy to see diffs and to back out changes).  And I need access control
for the 100 people, along with tracking per user of how many changes they
make, email notification of changes, administrative and super-user type of
user levels, and bla, bla bla, and so on.  Normal stuff.

I'm just bored with html forms.  Seems like I do this kind of project too
often -- read a record, post, validate, update...  Even with good
templating and code reuse between projects I still feel like I spend a lot
of time re-inventing the (my) wheel.  Will an application framework bring
me bliss?  I'm sure this is common type of project for many people.  What
solutions have you found to make this easy and portable from project to
project?



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: [request] modperl mailing lists searchable archives wanted

2001-10-09 Thread Bill Moseley


Hi Stas,

I just updated the search site for Apache.org with a newer version of
swish.  The context highlighting is a bit silly, but that can be fixed.
I'm only caching the first 15K of text from each page for context
highlighting.

http://search.apache.org

It seems reasonably fast (it's not running under mod_perl currently, but
could -- if mod_perl was in that server ;).

It takes about eight or nine minutes to reindex ~35,000 docs on *.apache.org
so the mod_perl list (and others) shouldn't too much trouble, I'd think,
with smaller numbers and smaller content.

It doesn't do incremental indexing at this point, which is a draw back, but
indexing is so fast it normally doesn't matter (and there's an easy
work-around for something like a mailing list to pickup new messages as
they come in during the day).

Swish-e can also call a perl program which feeds docs to swish.  That makes
it easy to parse the email into fields for something like:

  http://swish-e.org/Discussion/search/swish.cgi

which looks a lot like the Apache search site...

But, what would be needed is a good threaded mail archiver, which there are
many to pick from, I'd expect.

Some 
archives are browsable, but their search engines simply suck. e.g. 
marc.theaimsgroup.com I think is the only one that archives 
[EMAIL PROTECTED], but if you try to seach for perl string like 
APR::Table::FETCH it won't find anything. If you search for
get_dir_config it will split it into 'get', 'dir', 'config' and give you 
a zillion matches when you know that there are just a few.

On swish you could say : and _ are part of words and those would index
as full words.  Or, just simply search for phrase: get_dir_config and it
would search for the phrase get dir config which would probably find what
you want.

Maybe : and _ are ok in words, but you have to think carefully about
others.  It's more flexible to split the words and use phrases in many cases.



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: FW: FW: AuthCookie Woes!

2001-09-04 Thread Bill Moseley


At 07:59 AM 9/4/2001 -0400, Chris Lavin wrote:
Im sorry I thought that everyone would be familiar with Appach::AuthCookie

Perhaps if you posted a tiny httpd.conf and the url of where it's running.

And I'd tend to use telnet for debugging and write log messages in
Apache::AuthCookie where its setting the header, and so on.  



Bill Moseley
mailto:[EMAIL PROTECTED]

Random requests in log file

2001-08-07 Thread Bill Moseley


Hi,

We always see the normal probes for known insecure CGI scripts, and spiders
keep our logs full.  But lately there have been a huge number of requests
for resources that are not on our server (even not counting Code Red II).
It looks like someone is spidering another server, yet sending requests to
our machine -- the requests don't really look like probes for insecure
scripts, rather just for files that are not and never have been on this
server (or any related virtual hosts).

Does everyone else see these?  What's the deal?  Are they really probes or
some spider run amok?

Right now someone is looking for things like:

/r/dr
/r/g3
/r/sb
/r/sw
/r/s/2
/r/a/booth
/r/s/pp
/NowPlaying
/mymovies/list
/terms
/ootw/1999/oarch99_index.html

I currently have a killfile of IP addresses and a PerlInitHandler that
blocks requests,  but it would be nice to automate that process.  Are there
any current modules that do this?

Another thing I find odd: this server has three virtual hosts.  In the
second and third VH's logs I find requests for files found on the first,
default, VH.  I've logged the Host: header and indeed it was there.  Odd.



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Random requests in log file

2001-08-07 Thread Bill Moseley


At 10:24 AM 08/07/01 -0700, Nick Tonkin wrote:
  /r/dr
  /r/g3
  /r/sb
 www.yahoo.com/r/dr
 www.yahoo.com/r/sw


Yes, and I have seen plenty of cases where broken web servers or web sites
or web browsers screw up HREFs, by prepending an incorrect root uri to a
relative link.

That would be my guess, broken URLs somewhere out in space.

But why the continued hits for the wrong pages?  It's like someone spidered
an entire site, and then has gone back and is now testing all those HREFs
against our server.

Currently mod_perl is generating a 404 page.  When I block I return
FORBIDDEN, but that doesn't seem to stop the requests either.  They don't
seem to get the message...  

And isn't it correct that if they request again before CLOSE_WAIT is up
I'll need to spawn more servers?

If they are not sending requests in parallel I wonder if it would be easier
on my resources to really slow down responses as long as I don't tie up too
many of my processes.  If they ignore FORBIDDEN maybe they will see the
timeouts.

Time to look at the Throttle modules, I suppose.



Bill Moseley
mailto:[EMAIL PROTECTED]

Backing out a mod_perl install

2001-08-06 Thread Bill Moseley


I'm upgrading mod_perl on a Solaris 2.6 production machine.  Although a
little downtime on this machine won't be a big issue, I'm wondering about
backup plans.

I've got mod_perl ready for make install (I'm currently using a PERL5LIB
environment to test mod_perl on a high port from the blib).

So I was just going to bring down the server, make install, and then
startup the new server.  But, I'd like to be able to back out, just in
case.  I was thinking about tar'ing up the Apache name space, and Apache.pm
to backout the Perl modules so I could run the old httpd, if needed.

Is that a reasonable thing to do, and if so, is there anything else you
would suggest?


Thanks,


Bill Moseley
mailto:[EMAIL PROTECTED]

RE: Backing out a mod_perl install

2001-08-06 Thread Bill Moseley


At 03:21 PM 08/06/01 -0400, Geoffrey Young wrote:
 to backout the Perl modules so I could run the old httpd, if needed.

you can try the tar_Apache and offsite_tar arguments to make and see if they
wrap up everything you need...

Ok, thanks tar_Apache should include all that I need, thanks.  I don't see
the need for offsite_tar in my case, since I already have mod_perl built
and ready for install in the target machine.

No need to run make install on the httpd side, right?  I can just copy the
httpd binary (I'll be using the same ServerRoot as the existing 1.3.12
server), so I'm assuming all my icons and mime.types files from 1.3.12 will
be just fine.


Bill Moseley
mailto:[EMAIL PROTECTED]

PERL5LIB perl section

2001-08-06 Thread Bill Moseley


In a previous post today I mentioned how I was running mod_perl from the
build directory by setting a PERL5LIB.

I seem to need to add:

perl
/perl

at the top of httpd.conf.  Otherwise I get:

Apache.pm version 1.27 required!
/usr/local/lib/perl5/site_perl/5.005/sun4-solaris/Apache.pm is version 1.26

I use perl sections farther down in httpd.conf, but I seem to need it at
the very top.  If a PerlTaintCheck On comes before the perl/perl then I
get that error.

Why is that?




Bill Moseley
mailto:[EMAIL PROTECTED]

modules/status make test fail

2001-08-03 Thread Bill Moseley


I can come back with more, if needed, but just in case someone else has
seen this:

Fresh 1.26 and 1.3.20 Sun Solaris 2.6 Perl 5.005_03

I just did this on Linux and it worked just fine :(

This doesn't bother me, too much

modules/request.Use of uninitialized value at modules/request.t line 147.
Use of uninitialized value at modules/request.t line 147.
Use of uninitialized value at modules/request.t line 149.
Use of uninitialized value at modules/request.t line 147.
Use of uninitialized value at modules/request.t line 147.
Use of uninitialized value at modules/request.t line 149.
Use of uninitialized value at modules/request.t line 147.
Use of uninitialized value at modules/request.t line 147.
Use of uninitialized value at modules/request.t line 149.
Use of uninitialized value at modules/request.t line 147.
Use of uninitialized value at modules/request.t line 147.
Use of uninitialized value at modules/request.t line 149.
skipping test on this platform
modules/src.ok
modules/ssi.ok
modules/stage...skipping test on this platform

But:

modules/status..Internal Server Error
dubious
Test returned status 9 (wstat 2304, 0x900)
DIED. FAILED tests 8-10
Failed 3/10 tests, 70.00% okay

In error_log:

[Fri Aug  3 16:27:16 2001] [error] 
Can't locate object method inh_tree via package Devel::Symdump 
at /data/_g/lii/apache/1.26/mod_perl-1.26/blib/lib/Apache/Status.pm line
222, fh1b chunk 1.

Do I need an updated Devel::Symdump?



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: OT: Re: ApacheCon Dublin Cancelled?

2001-07-16 Thread Bill Moseley


At 10:46 AM 07/16/01 -0600, Nathan Torkington wrote:
Are there any requests other than price for next year?  What would you
like to see?  What could you do without?

Well, this is more along the price issue that you don't want to hear about,
but I much prefer a single fee for everything instead of separate tutorial
and conference fees.  I understand the scheduling hell, but I like the
flexibility to decide what to attend during the conference.  What I attend
in the morning may influence what I attend in the afternoon.

And these days more and more people may find themselves like me, paying
their own way.  I'm very disappointed that I had to cancel after adding
everything up.



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Lightweight CGI.pm for PerlHandlers

2001-05-19 Thread Bill Moseley


At 11:28 PM 05/18/01 -0400, Neil Conway wrote:
Is there a simple (fast, light) module that generates HTML
in a similar style to CGI.pm, for use with mod_perl?

Well, not in the style similar to CGI.pm, but how about a here doc -- if
it's that simple.

At the moment, I'd rather not move to a system like
HTML::Mason or Template Toolkit -- the HTML I'm producing
is very simple (in fact, I've just been using $r-print
up to now, and it's not _too_ bad).

I bounce between HTML::Template, Template-Toolkit, and here docs.  Frankly,
mostly I use HTML::Templace for even really simple things (I use a here doc
for the template).

This kind of thing is personal style, but I think the templates make the
HTML stuff easier (and hidden away).

One of my New Years Resolutions was to do everything in Template-Toolkit so
I'd learn it better.  (But I'm also not drinking less beer, windsurfing
more, working less, or having more, eh, sleep, either.)


Bill Moseley
mailto:[EMAIL PROTECTED]

mod_parrot

2001-04-01 Thread Bill Moseley


I assume everyone saw this... ;)

http://www.oreilly.com/parrot/

Bill Moseley
mailto:[EMAIL PROTECTED]

Re: cgi_to_mod_perl manpage suggestion

2001-03-14 Thread Bill Moseley


At 03:34 PM 03/14/01 +0200, Issac Goldstand wrote:
 On Tue, 13 Mar 2001, Andrew Ho wrote:
  PHUm, you're getting me confused now, but PerlSendHeader On means that
  PHmod_perl WILL send headers.

  I still think that the above line is confusing:  It is because mod_perl is
not sending headers by itelf, but rather your script must provide the
headers (to be returned by mod_perl).  However, when you just say "mod_perl
will send headers" it is misleading; it seems to indeicate that mod_perl
will send "Content-Type: text/html\r\n\r\n" all by itself, and that
conversely, to disable that PerlSendHeaders should be Off.

Read it again.  You are confusing "some" headers with "all" headers --
there's more than just Content-Type:.

To me it doesn't sound at all like it will send content-type:

PerlSendHeader On

   Now the response line and common headers will be sent
   as they are by mod_cgi.
(response line and common headers != content type)  

And, just as with mod_cgi,
   PerlSendHeader will not send a terminating newline,
   your script must send that itself, e.g.:
   -

print "Content-type: text/html\n\n";


All documentation has room for improvement, of course.  It's confusing if
you haven't written a mod_perl content handler before (or haven't read the
perldoc Apache or the Eagle book) and don't know that you need
$r-send_http_header under a normal content handler.  And if you are like
me, you have to read the docs a few times before it all makes sense.

Also, note that Apache::Registry lets reasonably well written CGI scripts
to run under both mod_cgi and Apache::Registry, which is what that man page
is describing.  It's not a CGI script if there's not a content-type: header
sent.  And the docs are not implying that you can turn on PerlSendHeader
and then go through all your CGI scripts and remove the print
"Content-type: text/html\n\n" lines.



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: [ANNOUNCE] Cache-Cache-0.03

2001-03-10 Thread Bill Moseley


At 01:03 PM 03/10/01 -0500, DeWitt Clinton wrote:
Summary:

  Perl Cache is the successor to the popular File::Cache and
  IPC::Cache perl libraries. This project unifies those modules under
  the generic Cache::Cache interface and implements Cache::FileCache,
  Cache::MemoryCache, Cache::SharedMemoryCache, and
  Cache::SizeAwareFileCache.

When you say successor to File::Cache does that means File::Cache will not
be maintained as a separate module anymore?

Have you though about making SharedMemoryCache flush to disk if it becomes
full but before it's time to expire the data?



Bill Moseley
mailto:[EMAIL PROTECTED]

ApacheCon Early-Bird registration

2001-03-10 Thread Bill Moseley


Hi,

Well, being on the West Coast I failed to realize that "ApacheCon
Early-Bird registration ends TO-DAY!" was sometime in the afternoon before
I got back on-line to find that announcement...

Anyway, are there any other cheap^H^H^H^H^H poor contractors that would
like to form a group and go for the group rate?

Is there a BOF schedule yet?


Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Authentication handlers

2001-03-03 Thread Bill Moseley

At 12:58 PM 03/03/01 +0530, Kiran Kumar.M wrote: 
>>>>
hi ,
i'm using mod_perl authentication handler, where the user's credentials are checked against a database  and in the database i have a flag which tells the login status (y|n), but aftr the user logs out the status is changed to n , my problem is that after logging out if the user  goes one page back and submits the browser sends the username and password again , and the status is changed to y . Is there any means of removing the username and password from the browsers cache.


I guess I don't understand you setup.  If you have a database entry that says they are logged out why don't you see this when they send their request and return a "Sorry, logged out" page?

I wouldn't count on doing anything on the client side.





Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Authentication handlers

2001-03-03 Thread Bill Moseley


At 10:11 AM 03/03/01 -0500, Pierre Phaneuf wrote:
The problem here is that the first basic authentication is not any
different from the next ones, so if he marks the user as logged out,
going to an page requiring authentication will simply mark the user as
logged in.

That's what I was assuming.

Basic authentication is annoying. They forgot to put a way to revoke the
thing when they designed it. Eh, that's life...

That's the real point.  Sometimes you have to weigh the use of a always-on
feature like basic authentication vs. maybe-on cookies.  

If you really must use basic authentication then besides the AUTH_REQUIRED
trick, sometimes you can get clients to forget by sending them to a new URL
with an embedded username and password that logs into the same AuthName but
with a different username/password combination.  But, you CAN'T count on
anything working unless you know all your clients -- if even then.

If your problem is that some clients don't use cookies, then perhaps
Apache::AuthCookieURL might help.




Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Stop button (was: Re: General Question)

2001-02-26 Thread Bill Moseley


At 02:02 PM 02/26/01 +, Steve Hay wrote:
I have a script which I wish to run under either mod_perl or CGI which does
little more than display content and I would like it to stop when the user
presses Stop, but I can't get it working.

You need to do different things under mod_perl and mod_cgi.  Refer to the
Guide for running under mod_perl -- you probably should check explicitly
for an aborted connection as the guide shows.

[This is all from my memory, so I hope I have the details correct]

Under mod_cgi Apache will receive the SIGPIPE when it tries to print to the
socket.  Since your CGI script is running as a subprocess (that has been
marked "kill_after_timeout", I believe), apache will first close the pipe
from your CGI program, send it a SIGTERM, wait three seconds, then send a
SIGKILL, and then reap.  This all happens in alloc.c, IIRC.

This is basically the same thing that happens when you have a timeout.

So, you can catch SIGTERM and then have three seconds to clean up.  You
won't see a SIGPIPE unless you try to print in that three second gap.  

Does it do the same thing under NT?




Bill Moseley
mailto:[EMAIL PROTECTED]

Stop button (was: Re: General Question)

2001-02-11 Thread Bill Moseley


At 08:43 AM 02/12/01 +0800, Stas Bekman wrote:
 What happens to the 54 earlier processes, since I submitted the request
55 times? How do Apache  mod_perl handle the processes to nowhere?

They get aborted the first moment they try to send some output (or read
input if they didn't finish yet) and after that get aborted as they
realize that the connection to the socket is dead. See:
http://perl.apache.org/guide/debug.html#Handling_the_User_pressed_Stop_

I thought one has to explicitly check for the aborted connection in =
1.3.5 -- like you explain in the section in the Guide following the one you
cited.  Isn't $r-print a noop after an aborted connection?

Which gives me a chance to ask an off topic question about this very topic:

As this arrived at in my inbox I was debugging a pair of old CGI scripts.
Both are very similar and use common modules for most everything -- one is
the public script and the other is the admin script for maintenance and
data entry.Both CGI.

Both scripts use a common module for logging and DBM file opening and
closing.  Really, the only difference is the form processing for the two
scripts. 

The DBM close is in an END block, and the END block also writes a
"Transaction Done" message to STDERR (STDERR is dup'ed to a log file I use
for locking before at the start of every request the DBM files -- very low
traffic script).

Hitting stop on the public script has no effect -- just like I'd expect for
1.3.12 -- it keeps running even when generating a long-ish output.

But hitting stop on the admin script aborts every time.  I have $SIG{PIPE}
= sub { print STDERR "$$ caught sigpipe" exit } in the common module and it
prints the error when I hit stop on the admin script.

Nothing is noted in the apache logs about broken pipes.

I'm scratching my head at this point.  Any ideas what to look for?


Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Stop button (was: Re: General Question)

2001-02-11 Thread Bill Moseley


I don't know why I have to learn this fresh again each time -- it appears
I'm confusing mod_perl and mod_cgi.

Let's see if I have this right.  Under mod_perl and apache = 1.3.5 if the
client drops the connection Apache will ignore it (well it might print an
info message to the log file about "broken pipe").  This means a running
mod_perl script will continue to run to completion, but the $r-prints go
nowhere.

The old Apache behavior of killing your running script can be restored
using Apache::SIG -- which is something you would not want to use if you
were doing anything besides displaying content, I'd think.

$r-connection-aborted can be used to detect the aborted connection (as
Stas shows in the Guide).  That sounds like a better way to deal with
broken connections.

Does all that sound right?

Are there still issues with doing this?

   local $SIG{PIPE} = sub { $aborted++ };



Then mod_cgi I'm still unclear on.

The cgi application does receive the SIGPIPE... well it did 1/2 hour ago
before I rebooted my machine.  Now I can't seem to catch it.
But, printing again after the SIGPIPE will kill the CGI script. 

Bill Moseley
mailto:[EMAIL PROTECTED]

[OT] Freeing Memory with of C extensions under Solaris

2001-02-07 Thread Bill Moseley


Hi,

Sorry for the OT, and I'm sure this is common knowledge.

I'm using some C extensions in my mod_perl application.  IIRC, memory used
by perl is never given back to the system.  Does this apply also to
malloc() and free()ed memory in my C extension?  Or is that OS dependent?
Can Apache ever shrink?

I ask because I'm using some C extensions that do a lot of memory
allocating and freeing of somewhat large structures.

I'm running under Linux and Solaris 2.6.

Another memory issue: I'm using File::Cache with a cron job that runs every
ten minutes to limit the amount of space used in /tmp/File::Cache -- it
seemed better than using an Apache child to do that clean up work.  But on
Solaris /tmp is carved out of swap.  Do you think it's risky to use /tmp
this way (since full /tmp could use up all swap)?

Thanks very much,



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: File::Cache problem

2001-02-07 Thread Bill Moseley


At 11:56 AM 02/07/01 +0400, BeerBong wrote:
And when cache size is exceeded all mod_perl processes are hanging.

I had this happen to me a few days back on a test server.  I thought I'd
made a mistake by doing a rm -rf /tmp/File::Cache while the server was
running (and while the File::Cache object was persistent).

And another question (I'm use Linux Debian Potato)...
Is there way to define params of the currently performing request

You could use a USR2 hander in your code, if that's what you are asking.

This is what my spinning httpd said the other day:

[DIE] USR2 at /data/_g/lii/perl_lib/lib/5.00503/File/Spec/Unix.pm line 57
File::Spec::Unix::catdir('File::Spec', '') called at
/data/_g/lii/perl_lib/lib/5.00503/File/Spec/Functions.pm line 41
File::Spec::Functions::__ANON__('') called at
/data/_g/lii/perl_lib/lib/site_perl/5.005/File/Cache.pm line 862
File::Cache::_GET_PARENT_DIRECTORY('/') called at
/data/_g/lii/perl_lib/lib/site_perl/5.005/File/Cache.pm line 962

But I haven't seen it happen since then.





Bill Moseley
mailto:[EMAIL PROTECTED]

Upgrading mod_perl on production machine (again)

2001-01-16 Thread Bill Moseley


This is a revisit of a question last September where I asked about
upgrading mod_perl and Perl on a busy machine.

IIRC, Greg, Stas, and Perrin offered suggestions such as installing from
RPMs or tarballs, and using symlinks.  The RPM/tarball option worries me a
bit, since if I do forget a file, then I'll be down for a while, plus I
don't have another machine of the same type where I can create the tarball.
 Sym-linking works great for moving my test application into live action,
but it seems trickier to do this with the entire Perl tree.

Here's the problem: this client only has this one machine, yet I need to
setup a test copy of the application on the same machine running on a
different port for the client and myself to test.  And I'd like to know
that when the test code gets moved live, that all the exact same code is
running (modules and all).

What to do in this situation?

a) not worry about it, and just make install mod_perl and restart the server
and hope all goes well?

b) cp -rp /usr/local/lib/perl5 and use symlinks to move between the two?
When ready to move, kill httpd, change the perl symlinks for the binary,
perl lib, and httpd, and restart?

c) setup a new set of perl, httpd, and my application and when ready to go
live just change the port number? 

Or simply put - how would you do this:

With one machine I want to upgrade perl to 5.6.0, upgrade your application
code, new version of mod_perl, and allow for testing of the new setup for a
few weeks, yet only require a few seconds of downtime to switch live (and
back again if needed)?

Then I wonder which CPAN module I'll forget to install...



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: location not working

2001-01-10 Thread Bill Moseley


At 09:59 AM 01/10/01 +0100, [EMAIL PROTECTED] wrote:
NEVERTHELESS, I get 404 when I enter
 http://myserver//hello/world
and it is looking in the htdocs directory according to the error_log.

Can you please post the entire error_log message.




Bill Moseley
mailto:[EMAIL PROTECTED]

Caching search results

2001-01-08 Thread Bill Moseley


I've got a mod_perl application that's using swish-e.  A query from swish
may return hundreds of results, but I only display them 20 at a time.  

There's currently no session control on this application, and so when the
client asks for the next page (or to jump to page number 12, for example),
I  have to run the original query again, and then extract out just the
results for the page the client wants to see.

Seems like some basic design problems there.

Anyway, I'd like to avoid the repeated queries in mod_perl, of course.  So,
in the sort term, I was thinking about caching search results (which is
just a sorted list of file names) using a simple file-system db -- that is,
(carefully) build file names out of the queries and writing them to some
directory tree .  Then I'd use cron to purge LRU files every so often.  I
think this approach will work fine and instead of a dbm or rdbms approach.


So I asking for some advice:

- Is there a better way to do this?

- There was some discussion about performance and how many files to put in
each directory in the past.  Are there some commonly accepted numbers for
this?

- For file names does it make sense to use a MD5 hash of the query string?
It would be nice to get an even distribution of files in each directory.

- Can someone offern any help with the locking issues?  I was hoping to
avoid shared locking during reading -- but maybe I'm worrying too much
about the time it takes to ask for a shared lock when reading.  I could
wait a second for the shared lock and if I don't' get it I'll run the query
again.

But it seems like if one process creates the file and begins to write
without LOCK_EX and then gets blocked, then other processes might not see
the entire file when reading.

Would it be better to avoid the locks and instead use a temp file when
creating and then do an (atomic?) rename?

Thanks very much,

Bill Moseley
mailto:[EMAIL PROTECTED]

Re: searchable site

2001-01-01 Thread Bill Moseley


At 07:08 PM 01/01/01 -0800, Paul J. Lucas wrote:
   SWISH++ can run as a multi-threaded daemon that listens on
   either a Unix-domain or TCP socket, hence also without forking.

Which I would guess seems like a better use of resources than placing the
SWISH-E code in each httpd child.

I was going to ask you why or what makes it "faster" and if that applies to
SWISH-E 2.x, but that's a bit too off topic.  Maybe in separate email.

BTW: http://homepage.mac.com/pauljlucas/software/swish/man/ seems broken.



Bill Moseley
mailto:[EMAIL PROTECTED]

[OT] Anyone good with IPC?

2000-12-22 Thread Bill Moseley


Sorry for the way OT post, but this list seems to have the smartest, most
experienced, most friendly perl programmers around  -- and this question on
other perl lists failed to get any bites.

Would someone be willing to offer a bit of help off list?

I'm trying to get two programs talking in an HTTP-like protocol through a
unix pipe.  I'm first trying to get it to work between two perl programs
(below), but in the end, the "client" will be a C program (and that's a
different nut to crack).

The goal is to add a "filter" feature to the C program, where you register
some external program (called a server, in this example, since it will be
answering requests) and the C program starts the server, and then feeds
requests over and over leaving the server in memory.

A simple filter might be something that converts to lower case, or converts
text dates to a timestamp.  The C program (client) sends headers and some
content, and the filter (server) returns headers and some content.  But
it's a "Keep Alive" connection, so another request can be sent without
closing the pipe.

This approach seems simple -- at least for someone writing the filter
program.  Just read and print (non-buffered).  It's probably not very
portable -- I'd expect to fail on Windows.  (Are there better methods?)

Anyway, this is the sample code I was trying, but was not getting anywhere.
Seems like IO::Select::can_read() returns true and then I can read back the
first header, but then can_read() never returns true again.

I really need to be able to read and parse the headers, then read
Content-Length: bytes since the content can be of varying length.

 cat client.pl

#!/usr/local/bin/perl -w
use strict;

use IPC::Open2;
use IO::Select;
use IO::Handle;

my ( $rh, $wh );

my $pid = open2($rh, $wh, './server.pl');
$pid || die "Failed to open";

my $read = IO::Select-new( $rh );

$rh-autoflush;
$wh-autoflush;

for (1..2) {
print "\n$0: Sending Headers:$_\n";

print $wh "Header-number: $_\n",
  "Content-type: perl/test\n",
  "Header: test\n\n";


# Now read the response
while ( 1 ) {

my $fh;

if ( ($fh) = $read-can_read(0) ) {
print "Can read!\n";

my $buffer = $rh;
#$fh-read( $buffer, 1024 );

last unless $buffer;

print "$0: Read $buffer";
} else {
print "Can't read sleeping...\n";
sleep 1;
}
}
print "$0: All done!\n";
}



lii@mardy:~  cat server.pl
#!/usr/local/bin/perl -w
use strict;

$|=1;

warn "In $0 pid=$$\n";

while (1) {
my @headers = ();
while (  ) {
chomp;
if ( $_ ) {
warn "$0: Read '$_'\n";

push @headers, $_;
} else {
for ( @headers ) {
warn "$0: Sending $_\n";
print $_,"\n";
}
print "\n";
last;
}
}
}


Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Dynamic content that is static

2000-12-22 Thread Bill Moseley


At 09:08 PM 12/22/00 -0500, Philip Mak wrote:
I realized something, though: Although the pages on my site are
dynamically generated, they are really static. Their content doesn't
change unless I change the files on the website.

This doesn't really help with your ASP files, but have you looked at ttree
in the Template Toolkit distribution?

The problem, AFAIK, is that ttree only looks only at the top level
documents and not included templates.  I started to look at
Template::Provider to see if there was an easy way to write out dependency
information to a file, and then stat all those files every five minutes
from a cron job and if anything changes, touch the top level files and then
run ttree again.

I'd like this because I'm generating cobarnded pages with mod_perl, and
many of the pages are really static content.


Bill Moseley
mailto:[EMAIL PROTECTED]

Re: fork inherits socket connection

2000-12-15 Thread Bill Moseley


At 04:02 PM 12/15/00 +0100, Stas Bekman wrote:
 Am I missing something?

You don't miss anything, the above code is an example of daemonization.
You don't really need to call setsid() for a *forked* process that was
started to execute something and quit. 

It's different if you call system() to spawn the process. But since it's a
child of the parent httpd it's not a leader anyway so you don't need the
extra fork I suppose. Am I correct?

In fact it's good that you've posted this doc snippet, I'll use it as it's
more complete and cleaner. Thanks.

Thank goodness!  I like this thread -- It's been hard keeping up with all
the posts just to see if PHP or Java is better than mod_perl ;)

Stas, will you please post your additions/notification about this when you
are done?  I do hope you go into a bit of detail on this, as I've posted
questions about setsid a number of times and locations and I'm still
unclear about when or when not to use it, why to use it, and how that might
relate to mod_perl and why it makes a difference between system() vs. fork.
 I've just blindly followed perlipc's recommendations.

BTW -- this isn't related to the infrequently reported problem of an Apache
child that won't die even with kill -9, is it?

Eagerly awaiting,



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: fork inherits socket connection

2000-12-15 Thread Bill Moseley


At 04:02 PM 12/15/00 +0100, Stas Bekman wrote:
  open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
  open STDOUT, '/dev/null'
  or die "Can't write to /dev/null: $!";
  defined(my $pid = fork) or die "Can't fork: $!";
  exit if $pid;
  setsid  or die "Can't start a new session: $!";
  open STDERR, 'STDOUT' or die "Can't dup stdout: $!";

You don't miss anything, the above code is an example of daemonization.
You don't really need to call setsid() for a *forked* process that was
started to execute something and quit.

Oh, so that's the difference between system and fork/exec!  That's what I
get for following perlipc instead of the Guide.

I've always done it the Hard Way (tm) before.  That is, in my mod_perl
handler I would fork, then waitpid, call setsid, fork again freeing Apache
to continue (and double fork to avoid zombies), and then finally exec my
long running program.  With this method I had to call setsid or else
killing the Apache parent would kill the long running process.

Calling system() in the handler and then doing a simple fork in the long
running program is much cleaner (but you all knew that already).  I just
didn't realize that it freed me from calling setsid.  I just have to
remember not to system() something that doesn't fork or return right away.

But, I'm really posting about the original problem of the socket bound the
by the forked program.  I tried looking through mod_cgi to see why mod_cgi
can fork off a long running process that won't hold the socket open but I
my poor reading of it didn't turn anything up.

Anyone familiar enough with Apache (and/or mod_cgi) to explain the
difference?  Does mod_cgi explicitly close the socket file descriptor(s)
before forking?

Thanks,


Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Script Debugging Modules?

2000-12-10 Thread Bill Moseley


At 08:04 PM 12/10/00 +0100, Stas Bekman wrote:
use constant DEBUG = 0;
...
warn "bar" if DEBUG;

you can keep the debug statements in the code without having any overhead
of doing if DEBUG, since they are stripped at compile time.

Just how smart is the compiler?

Maybe all these debugging options indicate there's too much stuff in this
module, but...

use constant DEBUG_TEMPLATE  = 1;
use constant DEBUG_SESSION   = 2;
use constant DEBUG_REQUEST   = 4;
use constant DEBUG_QUERY = 8;
use constant DEBUG_QUERY_PARSED  = 16;

my $debug = DEBUG_REQUEST|DEBUG_QUERY;

...

warn "query = '$query'\n" if $debug  DEBUG_QUERY;

Is the compiler that smart, or is there a better way such as 

use constant DEBUG_TEMPLATE  = 0;  # OFF
use constant DEBUG_SESSION   = 1;  # ON
use constant DEBUG_REQUEST   = 0;
use constant DEBUG_QUERY = 1;  # ON
use constant DEBUG_QUERY_PARSED  = 0;

warn $query if DEBUG_QUERY || DEBUG_QUERY_PARSED;



Bill Moseley
mailto:[EMAIL PROTECTED]

Sys::Signal Weirdness

2000-12-08 Thread Bill Moseley


This is slightly off topic, but my guess is Sys::Signal is mostly used by
mod_perl people.  Can someone else test this on their machine?

I have this weird problem where I'm not catching $SIG{ALRM}.  The test code
below is a simple alarm handler that looks like this:

eval {
local $SIG{__DIE__};
if ( $timeout  ) {
my $h = Sys::Signal-set(
ALRM = sub { die "Timeout after $timeout seconds\n" }
);
warn "Set Signal $h\n";
alarm $timeout;
}
print "Test 1 Parent reading: $_" while FH;

alarm 0 if $timeout;
};

This isn't working -- but if I simply comment out the if ( $timeout ) block
it works. 

Here's the output on my machine.

perl test.pl

Starting test2 - WITHOUT 'if ( $timeout )'
Set Signal Sys::Signal=SCALAR(0x810d120)
in child loop 12423
Test 2 Parent reading: 1
Test 2 Parent reading: 2
Test 2 Parent reading: 3
Timeout after 4 seconds  --- good!

Starting test1 - with 'if ( $timeout )'
Set Signal Sys::Signal=SCALAR(0x810d12c)
in child loop 12424
Test 1 Parent reading: 1
Test 1 Parent reading: 2
Test 1 Parent reading: 3
Alarm clock --- huh?

Here's some cut-n-paste test code.  This is what I get on Linux under 5.6.

#!/usr/local/bin/perl -w
use strict;

test2();
test1();

$|= 1;

use Sys::Signal;

sub test1 {
warn "\nStarting test1 - with 'if ( \$timeout )'\n";


   my $child = open( FH, '-|' );
   die unless defined $child;

   loop() unless $child;  # not that this works

my $timeout = 4;

eval {
local $SIG{__DIE__};
if ( $timeout  ) {
my $h = Sys::Signal-set(
ALRM = sub { die "Timeout after $timeout seconds\n" }
);
warn "Set Signal $h\n";
alarm $timeout;
}
print "Test 1 Parent reading: $_" while FH;

alarm 0 if $timeout;
};


if ( $@ ) {
   warn $@;
   kill( 'HUP', $child );
}
}

sub test2 {
warn "\nStarting test2 - WITHOUT 'if ( \$timeout )'\n";


   my $child = open( FH, '-|' );
   die unless defined $child;

   loop() unless $child;

my $timeout = 4;

eval {
local $SIG{__DIE__};
   ### if ( $timeout  ) {
my $h = Sys::Signal-set(
ALRM = sub { die "Timeout after $timeout seconds\n" }
);
warn "Set Signal $h\n";
alarm $timeout;
   ### }
print "Test 2 Parent reading: $_" while FH;

alarm 0 if $timeout;
};


if ( $@ ) {
   warn $@;
   kill( 'HUP', $child );
}
}


sub loop {
$|=1;
my $x;
warn "in child loop $$\n";
sleep 1, ++$x, print "$x\n"  while 1;
}


Bill Moseley
mailto:[EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Sys::Signal Weirdness

2000-12-08 Thread Bill Moseley


At 04:42 PM 12/08/00 +0100, Stas Bekman wrote:
Easy. Look at $h -- it's a lexically scoped variable, inside the block 
if($timeout){}. Of course when the block is over the setting disappears,
when $h gets DESTROYed.

Doh!  I thought about that (which is why I was printing $h).  I shouldn't
debug before sunrise.

Sure is nice to have you back, Stas! ;)



Bill Moseley
mailto:[EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Apache::SSI design questions

2000-11-29 Thread Bill Moseley


At 11:47 PM 11/28/00 -0600, Ken Williams wrote:
1) Is it preferred to use POSIX::strftime() for time formatting, or
   Date::Format::strftime()?  One solution would be to dynamically load one
   or the other module according to which one is available, but I'd rather
   not do that.

Hi Ken,

Why not Apache::Util::ht_time()?




Bill Moseley
mailto:[EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: session expiration

2000-11-20 Thread Bill Moseley


At 03:00 PM 11/20/00 -0600, Trey Connell wrote:
Is there anyway to know that a user has disconnected from their session
through network failure, power off, or browser closure? 

How is that different from just going out for a cup of coffee or opening a
new browser window and looking at a different site?

I am logging
information about the user to a database when they login to a site, and
I need to clean up this data when they leave.

Define "leave" and you will have the answer.

All you can do is set an inactivity timeout, I'd suspect.  cron is your
friend in these cases.

Cheers,


Bill Moseley
mailto:[EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: session expiration

2000-11-20 Thread Bill Moseley


At 05:20 PM 11/20/00 -0600, Trey Connell wrote:
The latter will be accomplished with cookies and the first rule will be
enforced with a "loggedin" flag in the database.  My problem lies in the user
not explicitly clicking logout when they leave the site.  If they explicitly
click logout, i can change the "loggedin" flag to false so that they can
enter
again the next time they try.

However, if they do not explicitly logout, I cannot fire the code to change
the flag in the database.

That's where cron comes in.  Just make your flag a time, and update it each
request.  cron then removes any that are older than some preset time and
*poof* they are then logged out.  They try to access again and you see they
have a cookie, yet are logged out and you say "Sorry, you session has
expired".

So basically I want to set a cookie that will allow them to enter the site
under their userid, but I can't allow them to enter if they are currently
logged in from elsewhere.

Why?  What if they want two windows open at the same time?  Is that
allowed?  That design limitation sounds like it's going to make trouble.



Bill Moseley
mailto:[EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[OT] Another new idea patented

2000-11-18 Thread Bill Moseley


I know this is way off topic, but I couldn't resist.  Sorry if this is old
news.

First Amazon figures out that cookies could be used for, (who would have
guessed?), maintaining state between sessions and patenting the concept.
What a new idea!

Now looking at eBay and I see that they have invented this thing called
"thumbnails" that are miniature photos that you can, get this, click with
your mouse!  Not only that, they have figured out a way to transfer images
from one computer to another via HTTP!  Another brilliant invention that
needs a patent.

http://pages.ebay.com/help/basics/g-gallery.html

 "Gallery

 Our patent pending Gallery () is a new way of browsing items
 for sale at eBay. The Gallery presents miniature pictures, called
 thumbnails, for all of the items sellers have supplied pictures for
 in JPG format."

US Patent 6,058,417 http://164.195.100.11/netahtml/srchnum.htm

Can Randal still give his "Mod_perl Enabled Thumbnail Picture Server" talk? 


BTW -- For fun go to Ebay and do a search for any auction, then look
closely at the HTTP headers IIS is spitting out when requesting any of
their auctions.  You can imagine the fun in trying to explain (over three
emails now) the problem to their customer support.



Bill Moseley
mailto:[EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

porting mod_perl content handler to CGI

2000-11-17 Thread Bill Moseley


Howdy,

I have an application that's pure mod_perl -- its modules use the request
object to do a few bits of work like reading parameters, query string,
specialized logging, dealing with NOT_MODIFIED, and so on.  Normal stuff
provided by the methods of Apache, Apache::Util, Apache::URI and
Apache::Connection.

Now, I'd like to use a few of my modules under CGI -- for an administration
part of the application that's bigger and not used enough to use up space
in the mod_perl server.  But it would be nice to have a common code base.

So, I'm writing a replacement module of those classes and supporting just
the few methods I need.  I'm using CGI.pm, URI, HTTP::Date and so on to
handle those few mod_perl methods I'm using in my modules.

For example, I have a function that does specialized logging that I want to
use both under mod_perl and CGI.  So, this would work under CGI

   my $r = Apache-request;
   my $remote = $r-connection-remote_ip;

where in the replacement package Apache::Connection:

   sub remote_ip { $ENV{REMOTE_ADDR} }


Before I spend much time, has this already been done?

Might be kind of cool if one could get new CGI programmers to write all
their CGI applications like mod_perl handlers -- could run as CGI on other
servers, but when they want speed they are ready to use mod_perl.

Anyway, does a mod_perl emulator for CGI exist?


Bill Moseley
mailto:[EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [RFC] Apache::Expires

2000-11-15 Thread Bill Moseley


At 10:26 AM 11/15/00 -0500, Geoffrey Young wrote:
hi all...

I was wondering if anyone has some experience with expire headers for
dynamic documents - kinda like mod_expires but for dynamic stuff.

Geoff,

Are you thinking about client/browsers or proxy caching with regard to
this?  Or does it matter?

I currently use Last-modified and Content-length headers in my dynamic
content that doesn't change much, but I've never considered using Expires,
but maybe it's because I'm not fully up on what help Expires does.

I have assumed that most browsers cache my documents and don't re-request
them in their current session, so am I correct that Expires would only help
for cases of browsers/clients that return to the page sometime in the
future yet before the document Expires, and after closing their browser?  I
wonder how to determine how many requests that would save.

Also, if a cached document is past its Expired time, does that force the
client to get a new document, or can it still use If-Modified-Since?
mod_expires indicates that a new document must be loaded, but RFC 2616
indicates that it can use If-Modified-Since (who know what the clients will
do).

I should know this too, but what effect does the presence of a query string
in the URL have on this?


Bill Moseley
mailto:[EMAIL PROTECTED]

Microperl

2000-11-15 Thread Bill Moseley


This is probably more of a Friday topic:

Simon Cozens discusses "Microperl" in the current The Perl Journal.

I don't build mod_rewrite into a mod_perl Apache as I like rewriting with
mod_perl much better.  But it doesn't make much sense to go that route for
a light-weight front-end to heavy mod_perl backend servers, of course.

I don't have any experience embedding perl in things like Apache other that
typing "perl Makefile.PL  make", but Simon's article did make me wonder.

So I'm curious from you that understand this stuff better: Could a
microperl/miniperl be embedded in Apache and end up with a reasonably
light-weight perl enabled Apache?  I understand you would not have
Dynaloader support, but it might be nice for simple rewriting.

Curiously yours,

Bill Moseley
mailto:[EMAIL PROTECTED]

Re: AuthCookie solution

2000-11-15 Thread Bill Moseley


At 04:19 PM 11/15/00 -0500, Charles Day wrote:
# We added the line below to AuthCookie.pm

$r-header_out("Location" = $args{'destination'}.$args{'args'});

Why pass a new argument?  Can't you just add the query string onto the
destination field in your login.pl script?

Something like the untested:

my $uri   = $r-prev-uri;
my $query = $r-prev-args;
$uri  = "$uri?$query" if $query;

print qq[INPUT TYPE=hidden NAME=destination VALUE="$uri"];



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Microperl

2000-11-15 Thread Bill Moseley

At 07:38 PM 11/15/00 -0600, Les Mikesell wrote:

- Original Message -
From: "Bill Moseley" [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, November 15, 2000 12:30 PM
Subject: Microperl

 I don't build mod_rewrite into a mod_perl Apache as I like rewriting with
 mod_perl much better.  But it doesn't make much sense to go that route for
 a light-weight front-end to heavy mod_perl backend servers, of course.

Just curious: what don't you like about mod_rewrite?

You ask that on the mod_perl list? ;)  It's not perl, of course.

I like those perl sections a lot.

Oh, there were the weird segfaults that I had for months and months.
http://www.geocrawler.com/archives/3/182/2000/10/0/4480696/

Nothing against mod_rewrite -- I was just wondering if a small perl could
be embedded with out bloating the server too much.

Bill Moseley
mailto:[EMAIL PROTECTED]

RE: [ANNOUNCE] ApacheCon USA 2001: Call For Papers

2000-11-14 Thread Bill Moseley


At 04:08 PM 11/14/00 +0100, Stas Bekman wrote:
Remember that your talk can be reused for both ApacheCon and TPC, most of
the people don't make it to the both conferences. So while you are
thinking about your TPC submission, at the same moment you can submit it
to ApacheCon as well.

For someone on a budget and no boss to pay my way, which conference will
have more mod_perl?

And for my 2 cents, I'd be interested in hearing about mod_perl and
designing for scalability, whatever that means.  Or was that the
mod_backhand talk I missed?



Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Fast DB access

2000-11-10 Thread Bill Moseley


At 09:20 PM 11/09/00 +, Tim Bunce wrote:
On Thu, Nov 09, 2000 at 08:27:29PM +, Matt Sergeant wrote:
 On Thu, 9 Nov 2000, Ask Bjoern Hansen wrote:
  If you're always looking stuff up on simple ID numbers and
  "stuff" is a very simple data structure, then I doubt any DBMS can
  beat 
  
   open D, "/data/1/12/123456" or ...
  
  from a fast local filesystem.
 
 Note that Theo Schlossnagel was saying over lunch at ApacheCon that if
 your filename has more than 8 characters on Linux (ext2fs) it skips from a
 hashed algorithm to a linear algorithm (or something to that affect). So
 go careful there. I don't have more details or a URL for any information
 on this though.

Similarly on Solaris (and perhaps most SysV derivatives) path component
names longer than 16 chars (configurable) don't go into the inode
lookup cache and so require a filesystem directory lookup.

Ok, possibly 8 chars in Linux and 16 under Solaris.  Anything else to
consider regrading the maximum number of files in a given directory?

How about issues regarding file size?  If you had larger files/records
would DBM or RDBMS provider larger cache sizes?


Bill Moseley
mailto:[EMAIL PROTECTED]

Re: pre-loaded modules on Solaris

2000-11-10 Thread Bill Moseley


At 06:11 AM 11/10/00 -0500, barries wrote:
 Address   Kbytes Resident Shared Private
   --  -- ---
 total Kb   24720   227203288   19432   pre-loaded modules
 total Kb   14592   1297630969880   not pre-loaed modules.

Stupid question, probably, but when running the non-pre-loaded version,
are you sure all the same modules are being loaded?

Yes.  According to perl-status, anyway.  Some modules are loaded into the
parent, of course, because of mod_perl.  But when not pre-loading I start
the server, look at perl-status and then made some requests and looked
again to see what was loaded.  The difference is what modules I'm use'ing
in my test.

I'm wondering if there's
some set of modules that, for some reason, isn't being loaded by the
sequence of requests you're firing against all of your httpds to
get the servers "warmed up" to represent real-life state.

When looking at pmap it looks like the main difference in "private" memory
usage is in the heap.  I'm not clear why the heap would end up so much
bigger when pre-loading modules.

Unfortunately, Linux doesn't seem to have the same reporting abilities as
Solaris, but using /proc/pid/statm to show shared and private memory
under these same test showed that pre-loading was a big win.  So it seems
like a Solaris issue. 




Bill Moseley
mailto:[EMAIL PROTECTED]

Re: Dealing with spiders

2000-11-10 Thread Bill Moseley


At 03:29 PM 11/10/00 +0100, Marko van der Puil wrote:
What we could do as a community is create spiderlawenforcement.org,
a centralized database where we keep track of spiders and how they
index our sites.

It's an issue weekly, but hasn't become that much of a problem yet.  The
bad spiders could just change IPs and user agent strings, too.

Yesterday I had 12,000 requests from a spider, but the spider added a slash
to the end of every query string so over 11,000 were invalid requests --
but the Apache log showed the requests as being a 200 (only the application
knew it was a bad request).

At this point, I'd just like to figure out how to detect them
programmatically.  It seems easy to spot them as a human looking through
the logs, but less so with a program.  Some spiders fake the user agent.

It probably makes sense to run a cron job every few minutes to scan the
logs and write out a file of bad IP numbers, and use mod_perl to the list
of IPs to block every 100 requests or so.  I could look for lots of
requests from the same IP with a really high relation of bad requests to
good.  But I'm sure it wouldn't be long before an AOL proxy got blocked.

Again, the hard part is finding a good way to detect them...

And in my experience blocking doesn't always mean the requests from that
spider stop coming ;)




Bill Moseley
mailto:[EMAIL PROTECTED]

Re: pre-loaded modules on Solaris

2000-11-09 Thread Bill Moseley


Hi Mike,

I've cc'd the mod_perl list for other Solaris users to consider.

At 10:49 AM 11/01/00 -0800, Michael Blakeley wrote:
I saw a significant benefit from pre-loading modules. Let's take a 
test case where we share via startup.pl:

Without preloading:

# tellshared.pl web
count vsz rss  kB   %
8  146304   67784   78520  54

With the "use" section above:

# tellshared.pl web
count vsz rss  kB   %
8  132672   17032  115640  87

'rss' is the resident set size - that is, the amount of actual RAM in 
use. The 'vsz' tells the size of the virtual address space - the swap 
in use. The kB column shows the difference (ie, the "saved RAM" via 
page sharing) and the % shows the %-shared.

I'm not clear you can measure the shared memory space that way.  I don't
really understand the memory system much, but here's a paper that I found
interesting:

http://www.sun.com/solutions/third-party/global/SAS/pdf/vmsizing.pdf

"The virtual address size of a process often bares no resemblance to the
amount of memory a process is using because it contains all of the
unallocated memory, libraries, shared memory and sometimes hardware devices
(in the case of XSun).

"The RSS figure is a measure of the amount of physical memory mapped into a
process, but often there is more than one copy of the process running, and
a large proportion of a process is shared with another.

"MemTool provides a mechanism for getting a detailed look at a processes
memory utilization. MemTool can show how much memory is in-core, how much
of that is shared, and hence how much private memory a process has. The
pmem command (or /usr/proc/bin/pmap -x in Solaris 2.6) can be used to show
the memory utilization of a single process."


Now, with a simple "hello world" mod_perl handler that loaded a bunch of
modules I did see that pre-loading modules reduces memory usage -- both in
looking at ps output, and with the pmap program, even after a number of
requests.  This is consistent with what you commented on above.

I'm repeating, but I found with a real-world application that sharing the
modules ended up using quite a bit more "private" memory.  I don't know if
that's only an issue with my specific OS, or with how my specific
application is running.

Here's ps output with pre-loaded modules.  On these tests I'm running
Apache 1.3.12 with mod_perl 1.24 static, but everything else is DSO.  I've
got maxclients set to one so there's only the parent and one child.

I'm pre-loading modules in a perl section here:

S USER   PID  PPID %CPU %MEM  VSZ  RSSSTIMETIME COMMAND
S  lii   318 1  0.0  0.3 8376 5464 15:50:380:00 httpd.mo
S  lii   319   318  0.8  1.1 24720 21288 15:50:380:05 httpd.mo

And now without pre-loaded modules:

S USER   PID  PPID %CPU %MEM  VSZ  RSSSTIMETIME COMMAND
S  lii  1260 1  0.0  0.2 4392 3552 15:56:250:00 httpd.mo
S  lii  1261  1260  0.9  0.6 14592 12304 15:56:250:05 httpd.mo


And here's comparing the totals returned by the pmap program that should
detail shared and private memory (according to the paper cited above).

Address   Kbytes Resident Shared Private
  --  -- ---
total Kb   24720   227203288   19432   pre-loaded modules
total Kb   14592   1297630969880   not pre-loaed modules.

Indeed there's a tiny bit more shared memory in the pre-loaded Apache, but
the amount of "private" memory is significantly higher, too.  Ten megs a
child will add up.  It doesn't really make sense to me, but that's what
pmap is showing.

Maybe this isn't that interesting.  Anyway, I'll try a non DSO Apache and
see if it makes a difference, and also try with an Apache that forks off
more clients than just one, but I can't imagine that making a difference.

Later,


Bill Moseley
mailto:[EMAIL PROTECTED]

Dealing with spiders

2000-11-04 Thread Bill Moseley


This is slightly OT, but any solution I use will be mod_perl, of course.

I'm wondering how people deal with spiders.  I don't mind being spidered as
long as it's a well behaved spider and follows robots.txt.  And at this
point I'm not concerned with the load spiders put on the server (and I know
there are modules for dealing with load issues).

But it's amazing how many are just lame in that they take perfectly good
HREF tags and mess them up in the request.  For example, every day I see
many requests from Novell's BorderManager where they forgot to convert HTML
entities in HREFs before making the request.

Here's another example:

64.3.57.99 - "-" [04/Nov/2000:04:36:22 -0800] "GET /../../../ HTTP/1.0" 400
265 "-" "Microsoft Internet Explorer/4.40.426 (Windows 95)" 5740

In the last day that IP has requested about 10,000 documents.  Over half
were 404 requests where some 404s were non-converted entities from HREFs,
but most were just for documents that do not and have never existed on this
site.  Almost 1000 request were 400s (Bad Request like the example above).
And I'd guess that's not really the correct user agent, either

In general, what I'm interested in stopping are the thousands of requests
for documents that just don't exist on the site.  And to simply block the
lame ones, since they are, well, lame.

Anyway, what do you do with spiders like this, if anything?  Is it even an
issue that you deal with?

Do you use any automated methods to detect spiders, and perhaps block the
lame ones?  I wouldn't want to track every IP, but seems like I could do
well just looking at IPs that have a high proportion of 404s to 200 and
304s and have been requesting over a long period of time, or very frequently.

The reason I'm asking is that I was asked about all the 404s in the web
usage reports.  I know I could post-process the logs before running the web
reports, but it would be much more fun to use mod_perl to catch and block
them on the fly.

BTW -- I have blocked spiders on the fly before -- I used to have a decoy
in robots.txt that, if followed, would add that IP to the blocked list.  It
was interesting to see one spider get caught by that trick because it took
thousands and thousands of 403 errors before that spider got a clue that it
was blocked on every request.

Thanks,


Bill Moseley
mailto:[EMAIL PROTECTED]

1 2 >

1 - 100 of 178 matches

Mail list logo