Re: mod_perl vs. C for high performance Apache modules

2001-12-14 Thread Simon Rosenthal

At 03:58 PM 12/14/2001, Jeff Yoak wrote:
At 09:15 PM 12/14/2001 +0100, Thomas Eibner wrote:
The key to mod_perl development is speed, there are numerous testimonials
from users implementing a lot of work in a very short time with mod_perl.
Ask the clients investor wheter he wants to pay for having everything you
did rewritten as an Apache module in C. That is very likely going to take
a lot of time.

Thank you for your reply.  I realized in reading it that my tone leads one 
to the common image of a buzzword driven doody-head who wants this because 
of what he read in Byte.  That's certainly common enough, and I've never 
had a problem dealing with such types.  (Well... not an unsolvable 
problem... :-)

This is something different.  The investor is in a related business, and 
has developed substantially similar software for years.  And it is really 
good.  What's worse is that my normal, biggest argument isn't compelling 
in this case, that by the time this would be done in C, I'd be doing 
contract work on Mars.  The investor claims to have evaluated Perl vs. C 
years ago, to have witnessed that every single hit on the webserver under 
mod_perl causes a CPU usage spike that isn't seen with C, and that under 
heavy load mod_perl completely falls apart where C doesn't.  (This code 
is, of course, LONG gone so I can't evaluate it for whether the C was good 
and the Perl was screwy.)  At any rate, because of this, he's spent years 
having good stuff written in C.  Unbeknownst to either me or my client, 
both this software and its developer were available to us, so in this case 
it would have been faster, cheaper and honestly even better, by which I 
mean more fully-featured.

CPU usage is certainly one factor... but CPUs are cheap compared to 
development man-hours.

Since you haven't provided any details on the application, this may not be 
relevant, but most of the web apps that we write (and I read about 
here)  spend much of their time waiting for responses from other back-end 
servers - databases, NFS mounted file systems, or whatever. It's probably 
undeniable that a well written C application will run faster than almost 
anything in an interpreted language, but that may not make much of a 
difference to the total response time.

-Simon





Simon Rosenthal ([EMAIL PROTECTED])
Web Systems Architect
Northern Light Technology
One Athenaeum Street. Suite 1700, Cambridge, MA  02142
Phone:  (617)621-5296: URL:  http://www.northernlight.com
Northern Light - Just what you've been searching for




Re: Phase for controlling network input?

2001-09-26 Thread Simon Rosenthal

I'm not sure that any mod_perl handlers are dispatched until the whole 
request is received, so you may have to deal with this at the core Apache 
level.

I think the following is your best bet (from 
http://httpd.apache.org/docs/mod/core.html#timeout )

TimeOut directive

Syntax: TimeOut number
Default: TimeOut 300
Context: server config
Status: core

The TimeOut directive currently defines the amount of time Apache will 
wait for three things:

1.The total amount of time it takes to receive a GET request.
2.The amount of time between receipt of TCP packets on a POST or PUT 
 request.
3.The amount of time between ACKs on transmissions of TCP packets in 
 responses.

We plan on making these separately configurable at some point down the 
road. The timer used to default to 1200 before 1.2, but has been lowered
to 300 which is still far more than necessary in most situations. It is 
not set any lower by default because there may still be odd places in the code
where the timer is not reset when a packet is sent.


We've  experienced this kind of attack inadvertently (as the result of a 
totally misconfigured HTTP client app which froze in the middle of sending 
an HTTP request ;=) but I wasn't aware that there were known attacks based 
on that.

-Simon


At 11:09 AM 9/26/2001, Bill McGonigle wrote:
I'm hoping this is possible with mod_perl, since I'm already familiar with 
it and fairly allergic to c, but can't seem to figure out the right phase.

I've been seeing log files recently that point to a certain DDOS attack 
brewing on apache servers.  I want to write a module that keeps a timer 
for the interval from when the apache child gets a network connection to 
when the client request has been sent.

I need a trigger when a network connection is established and a trigger 
when apache thinks it has received the request (before the response).

PerlChildInitHandler seems too early, since the child may be a pre-forked 
child without a connection.  PerlPostReadRequest seems too late since I 
can't be guaranteed of being called if the request isn't complete, which 
is the problem I'm trying to solve.  I could clear a flag in 
PerlPostReadRequest, but that would imply something is persisting from 
before that would be able to read the flag.

Maybe I'm think about this all wrong.  Any suggestions?

Thanks,
-Bill

-
Simon Rosenthal ([EMAIL PROTECTED])
Web Systems Architect
Northern Light Technology
One Athenaeum Street. Suite 1700, Cambridge, MA  02142
Phone:  (617)621-5296: URL:  http://www.northernlight.com
Northern Light - Just what you've been searching for




Re: Dynamic httpd.conf file using mod_perl...

2001-04-17 Thread Simon Rosenthal

At 04:16 AM 4/17/01, Ask Bjoern Hansen wrote:
On Mon, 16 Apr 2001, Jim Winstead wrote:

[...]
  you would have to do a "run config template expander  HUP" instead
  of just doing a HUP of the apache parent process, but that doesn't
  seem like a big deal to me.

And it has the big advantage of also working with httpd's without
mod_perl.

like proxy servers ...

  Going off on a slight tangent from the orginal topic - the template-based 
approach would also work well for subsystems that have separate 
configuration files - we put quite a bit of application configuration info 
into files other than httpd.conf, so that we can modify it without 
requiring a server restart.

-Simon



  - ask

--
ask bjoern hansen, http://ask.netcetera.dk/   !try; do();
more than 70M impressions per day, http://valueclick.com




[OT] HTTP/1.1 client support using LWPng

2001-02-21 Thread Simon Rosenthal

Slightly off topic...

I am considering using  the LWPng (HTTP/1.1) client code for an app where 
we could gladly use both of the HTTP/1.1 features that it offers: 
persistent connections (the client and server are separated by 7 time zones 
and the TCP connect time is a horrible 125 ms ;=( , and pipelining of 
requests. The status of the code , according to Gisle Aas, is definitely 
alpha, and it hasn't been touched in a few years.

Has anyone else used this module ? and how successfully ?

Thanks

-Simon
-
Simon Rosenthal ([EMAIL PROTECTED])  
Web Systems Architect
Northern Light Technology
One Athenaeum Street. Suite 1700, Cambridge, MA  02142
Phone:  (617)621-5296  :   URL:  http://www.northernlight.com
"Northern Light - Just what you've been searching for"




Re: Socket/PIPE/Stream to long running process

2001-02-07 Thread Simon Rosenthal

At 11:04 AM 2/2/01 -0800, Rob Bloodgood wrote:
So, in my mod_perl app, I run thru each request, then blast a UDP packet to
a process on the local machine that collects statistics on my traffic:

snip



My question is, should I be creating this socket for every request?  OR
would it be more "correct" to create it once on process startup and stash it
in $r-pnotes or something?

we have similar code in a mod_perl environment  for sending Multicast UDP 
packets - I just store the socket filehandle  in a global when it's 
created,  and the next request can pick it up from there (just test if the 
global is defined).
keeping the endpoint info in pnotes is only useful if you need write 
multiple UDP packets per request.

-Simon



And if I did that, would it work w/ TCP?  Or unix pipes/sockets (which I
*don't* understand) (btw the box is linux)?  In testing, I'd prefer not to
use TCP because it blocks if the count server is hung or down, vs UDP, where
I just lose a couple of packets.

TIA!

L8r,
Rob

-----
Simon Rosenthal ([EMAIL PROTECTED])  
Web Systems Architect
Northern Light Technology
One Athenaeum Street. Suite 1700, Cambridge, MA  02142
Phone:  (617)621-5296  :   URL:  http://www.northernlight.com
"Northern Light - Just what you've been searching for"




Re: Caching search results

2001-01-08 Thread Simon Rosenthal

At 10:10 AM 1/8/01 -0800, you wrote:
Bill Moseley wrote:
  Anyway, I'd like to avoid the repeated queries in mod_perl, of course.  So,
  in the sort term, I was thinking about caching search results (which is
  just a sorted list of file names) using a simple file-system db -- that is,
  (carefully) build file names out of the queries and writing them to some
  directory tree .  Then I'd use cron to purge LRU files every so often.  I
  think this approach will work fine and instead of a dbm or rdbms approach.

Always start with CPAN.  Try Tie::FileLRUCache or File::Cache for
starters. A dbm would be fine too, but more trouble to purge old entries
from.

an RDBMS is not much more trouble to purge, if you have a 
time-of-last-update field. And if you're ever going to access your cache 
from multiple servers, you definitely don't want to deal with  locking 
issues for DBM and filesystem based solutions ;=(

-Simon

-
Simon Rosenthal ([EMAIL PROTECTED])  
Web Systems Architect
Northern Light Technology
One Athenaeum Street. Suite 1700, Cambridge, MA  02142
Phone:  (617)621-5296  :   URL:  http://www.northernlight.com
"Northern Light - Just what you've been searching for"




Re: Caching search results

2001-01-08 Thread Simon Rosenthal

At 02:02 PM 1/8/01 -0800, Sander van Zoest wrote:
On Mon, 8 Jan 2001, Simon Rosenthal wrote:

  an RDBMS is not much more trouble to purge, if you have a
  time-of-last-update field. And if you're ever going to access your cache
  from multiple servers, you definitely don't want to deal with  locking
  issues for DBM and filesystem based solutions ;=(

RDBMS does bring replication and backup issues. The DBM and FS solutions
definately have their advantages. It would not be too difficult to write
a serialized daemon that makes request over the net to a DBM file.

What in you experience makes you pick the overhead of an RDBMS for a simple
cache in favor of DBM, FS solutions?

We cache user session state  (basically using Apache::Session) in a small 
(maybe 500K records) mysql database , which is accessed by multiple web 
servers. We made an explicit decision NOT to replicate or backup this 
database - it's very dynamic, and the only user visible consequence of a 
loss of the database would be an unexpected login screen - we felt this was 
a tradeoff we could live with.  We have a hot spare mysql instance which 
can be brought into service immediately, if required.

  I couldn't see writing a daemon as you suggested  offering us any 
benefits under those circumstances, given that RDBMS access is built into 
Apache::Session.

I would not be as cavalier as this if we were doing anything more than 
using the RDBMS as a fast cache. With decent hardware (which we have - Sun 
Enterprise servers  with nice fast disks and enough memory) the typical 
record retrieval time  is around 10ms, which  even if slow compared to a 
local FS access is plenty fast enough in the context of the processing we 
do for dynamic pages.

Hope this answers your question.

-Simon




--
Sander van Zoest [[EMAIL PROTECTED]]
Covalent Technologies, Inc.   http://www.covalent.net/
(415) 536-5218 http://www.vanzoest.com/sander/

-
Simon Rosenthal ([EMAIL PROTECTED])  
Web Systems Architect
Northern Light Technology
One Athenaeum Street. Suite 1700, Cambridge, MA  02142
Phone:  (617)621-5296  :   URL:  http://www.northernlight.com
"Northern Light - Just what you've been searching for"




Re: [OT?] Cross domain cookie/ticket access

2000-09-08 Thread Simon Rosenthal

At 11:37 PM 9/7/00 -0600, Joe Pearson wrote:
I thought you could set a cookie for a different domain - you just can't
read a different domain's cookie.  So you could simply set 3 cookies when
the user authenticates.

I don't think you can set a cookie for a completely different domain, based 
on my reading of RFC2109 and some empirical tests ... it would be a massive 
privacy/security hole, yes ?

- Simon


Now I'm curious, I'll need to try that.

--
Joe Pearson
Database Management Services, Inc.
208-384-1311 ext. 11
http://www.webdms.com

-Original Message-
From: Aaron Johnson [EMAIL PROTECTED]
To: [EMAIL PROTECTED] [EMAIL PROTECTED]
Date: Thursday, September 07, 2000 10:08 AM
Subject: [OT?] Cross domain cookie/ticket access


 I am trying to implement a method of allowing access to three separate
 servers on three separate domains.
 
 The goal is to only have to login once and having free movement across
 the three protected access domains.
 
 A cookie can't work due to the limit of a single domain.
 
 Has anyone out there had to handle this situation?
 
 I have thought about several different alternatives, but they just get
 uglier and uglier.
 
 One thought was that they could go to a central server and login.  At
 the time of login they would be redirected to a special page on each of
 the other two servers with any required login information.  These pages
 would in turn return them to the login machine.  At the end of the login
 process they would be redirected to the web site they original wanted.
 
 This is a rough summary of what might happen -
 
 domain1.net - user requests a page in a protected directory.   They
 don't have a cookie.
 They are redirected to the cookie server.  This server asks for the user
 name and pass and authenticates the user.  Once authenticated the cookie
 server redirects the client to each of the other (the ones not matching
 the originally requested domain) domains.  This redirect is a page that
 hands the client a cookie and sets up the session information.
 domain2.net gets the request and redirects the user to a page that will
 return them to the cookie machine which will add the domain2.net to the
 list of domains in the cookie. And then the process will repeat for each
 domain that needs to be processed.
 
 Am I crazy?  Did I miss something in the documentation for the current
 Session/Auth/Cookie modules?
 
 I did some hacking of the Ticket(Access|Tool|Master) Example in the
 Eagle book, but the cookie limit is keeping it from working correctly.
 ( BTW: I already use it for a single server login and it works great. )
 
 Any information would be appreciated.
 
 Aaron Johnson
 
 

-
Simon Rosenthal ([EMAIL PROTECTED])  
Web Systems Architect
Northern Light Technology   222 Third Street, Cambridge MA 02142
Phone:  (617)621-5296  :   URL:  http://www.northernlight.com
"Northern Light - Just what you've been searching for"




Re: What phase am I in

2000-04-07 Thread Simon Rosenthal

At 12:51 PM 4/7/00 -0400, Paul G. Weiss wrote:
Is there any way to determine from the Apache::Request object
what phase of handling we'er in?   I have some code that is used
during more than one phase and I'd like it to behave differently
for each phase.

the current_callback() method (Eagle book, p465). Funny, I had to find this 
out yesterday..

- Simon


-Paul

-
Simon Rosenthal ([EMAIL PROTECTED])  
Web Systems Architect
Northern Light Technology   222 Third Street, Cambridge MA 02142
Phone:  (617)577-2796  :   URL:  http://www.northernlight.com
"Northern Light - Just what you've been searching for"




Re: [Rare Modules] Apache::RegistryNG

2000-02-04 Thread Simon Rosenthal

At 06:17 PM 2/4/00 +0200, Stas Bekman wrote:
The next module is Apache::RegistryNG.

CApache::RegistryNG is the same as CApache::Registry, aside from
using filename instead of URI for the namespace. It also uses OO
interface.

   snip

There is no compelling reason to use CApache::RegistryNG over
CApache::Registry, unless you want to do add or change the
functionality of the existing IRegistry.pm.  For example,
CApache::RegistryBB (Bare-Bones) is another subclass that skips the
stat() call performed by CApache::Registry on each request.


One situation where Apache::RegistryNG  may definitely be required  is if 
you are rewriting URLS  (using either mod_rewrite or your own handler)  in 
certain ways.

For instance  if you  have a rewrite rule  of the form XYZ123456.html  == 
/perl/foo.pl?p1=XYZp2=123456
Apache::Registry loses big, as it recompiles foo.pl  for each unique 
URL.  We ran into this and were totally baffled  as to why we had no 
mod_perl performance boost  until Doug pointed us to RegistryNG, which is 
definitely your friend in these circumstances.

- Simon

-
Simon Rosenthal ([EMAIL PROTECTED])  
Web Systems Architect
Northern Light Technology   222 Third Street, Cambridge MA 02142
Phone:  (617)577-2796  :   URL:  http://www.northernlight.com
"Northern Light - Just what you've been searching for"



Re: Caching DB queries amongst multiple httpd child processes

2000-02-03 Thread Simon Rosenthal

At 03:33 PM 2/3/00 +1100, Peter Skipworth wrote:
Does anyone have any experience in using IPC shared memory or similar in
caching data amongst multiple httpd daemons ? We run a large-ish database
dependent site, with a mysql daemon serving many hundreds of requests a
minute. While we are currently caching SQL query results on a per-process
basis, it would be nice to share this ability across the server as a
whole.

I've played with IPC::Shareable and IPC::ShareLite, but both seem to be a
little unreliable - unsurprising as both modules are currently still under
development. Our platform is a combination of FreeBSD and Solaris servers
- speaking of which, has anyone taken this one step further again and
cached SQL results amongst multiple web servers ?

We looked at this, as we have a  busy multiple web server environment and 
are planning to use Apache::Session + Mysql to manage session state. 
Although per-host caching in shared memory or whatever seemed desirable on 
paper, the complexities of ensuring that cache entries are not invalid due 
to an update on another server are major.

When we  set up a testbed to benchmark Mysql  for this project, the 
time  taken to retrieve or update a session state record  across the 
network over an established connection to  our Mysql host (Sparc 333 mhz 
Ultra 5/Solaris 2.6 with lots of memory) was so small (5-7 ms including 
LOCK/UNLOCK TABLE commands where needed) that we didn't pursue per host 
caches any further.

Clearly, YMMV depending on the hardware you have available.

- Simon


Thanks in advance,

Peter Skipworth

--
.-.
|   Peter SkipworthPh: 03 9897 1121   |
|  Senior Programmer  Mob: 0417 013 292   |
|  realestate.com.au   [EMAIL PROTECTED] |
`-'


-
Simon Rosenthal ([EMAIL PROTECTED])  
Web Systems Architect
Northern Light Technology   222 Third Street, Cambridge MA 02142
Phone:  (617)577-2796  :   URL:  http://www.northernlight.com
"Northern Light - Just what you've been searching for"



Job openings at Northern Light Technology, Cambridge, MA.

2000-02-02 Thread Simon Rosenthal


Who we're looking for:

We're looking for a Senior/Principal Engineer in the Web 
Architecture/Systems group, which is responsible for all aspects of web 
server technology and the development of core technology components for our 
web based applications. (We have a separate Applications development group 
who are also hiring; see our jobs page at 
http://www.northernlight.com/docs/jobs_company.html for other open positions).

About you:

You will have 2+ years software development experience using C/C++/Perl in 
a UNIX environment, plus intimate familiarity with the Apache web server, 
mod_perl, and a good understanding of system performance and tuning issues 
in a high traffic Web environment. Ability to work on multiple projects at 
once, good communications skills and a tolerance for organized chaos are 
all highly desirable.

About Northern Light:

Since it premiered as the first Web-based research engine in August of 
1997, Northern Light has grown considerably, earning accolades for its 
search engine technology. We're  now a 150 person company (pre IPO) 
headquartered in  the Kendall Square area of Cambridge, Mass. We're not new 
kids on the block. Our management team has been around a few blocks with, 
combined, over 100 years of experience in the software industry. They know 
what it takes to make the company successful. At the same time, we're very 
young at heart. We tackle interesting projects and actively encourage 
creative thinking and continuous learning. The energy, humor, and 
commitment to quality shared by people at Northern Light are unsurpassed.

Please send your resume by email, fax or ground mail to:
 Human Resources
 Northern Light Technology
 222 Third Street, Suite 1320
 Cambridge, MA 02142
 Fax: (617) 621-3459
 Email: [EMAIL PROTECTED]

Feel free to call me at (617) 621 5296 or email me if you have any questions.

-
Simon Rosenthal ([EMAIL PROTECTED])  
Web Systems Architect
Northern Light Technology   222 Third Street, Cambridge MA 02142
Phone:  (617)577-2796  :   URL:  http://www.northernlight.com
"Northern Light - Just what you've been searching for"



Re: access_log

2000-01-12 Thread Simon Rosenthal

At 11:09 AM 1/12/00 -0500, Gacesa, Petar wrote:
I was doing the stress testing of the Apache web server by simulating a
large number of http requests.  After several hours I started getting the
following line in my access_log file:

165.78.11.40 - - [11/Jan/200:22:33:45 -0500] "-" 408 -

Instead of the URL that was supposed to be accessed.

Can somebody please tell me what this means?

Petar
It's a bit off-topic... nothing to do with mod_perl.

It's reporting a situation when the client starts to send an HTTP request 
but doesn't complete it - you have an Apache process tied up waiting for 
the request to complete; it doesn't, and Apache  eventually times out the 
request and  so logs it.

So look at your simulated client.

- Simon
-----
Simon Rosenthal ([EMAIL PROTECTED])  
Web Systems Architect
Northern Light Technology   222 Third Street, Cambridge MA 02142
Phone:  (617)577-2796  :   URL:  http://www.northernlight.com
"Northern Light - Just what you've been searching for"



Managing session state over multiple servers

1999-12-16 Thread Simon Rosenthal


Hi:

We're planning on migrating to an Apache::Session + mysql approach for 
managing session state, for a large-ish site hosted on multiple servers. 
While there have been many useful discussions on this list concerning the 
technologies involved, I haven't seen many war stories from the field, as 
it were.

I have some specific questions  - hopefully someone out there has had to 
address these issues and may have some good advice.

a) If your site runs on multiple servers, do you attempt to cache session 
state records on the web server for any length of time after they are 
retrieved from the DBMS ? and if so, how do you handle cache consistency 
across all your servers  - (we have rather blind load balancing hardware in 
front of our server farm with no way  of implementing any kind of server 
affinity that I am aware of)

b) Does anyone have redundant database servers ? If so ... are there any 
implementation gotchas ? and if you have a single server, how does session 
management work  when it goes down ? (I'm pretty happy with the hardware - 
Suns - which we have, but a disk can go at any time)

c) This is more of a mysql question  When do people consider a session 
to have expired ? and what is the best strategy for deleting expired 
sessions from a database, especially given that mysql's table based locking 
seems to leave a bit to be desired if you're trying to mix update 
operations with a big SELECT/DELETE to purge expired sessions  ?

TIA

- Simon
-
Simon Rosenthal ([EMAIL PROTECTED])  
Web Systems Architect
Northern Light Technology   222 Third Street, Cambridge MA 02142
Phone:  (617)577-2796  :   URL:  http://www.northernlight.com
"Northern Light - Just what you've been searching for"