Virtual locations in mod_perl

2002-03-06 Thread Milo Hyson

I'm having trouble understanding how to configure mod_perl to execute a 
handler when called with a virtual location (i.e. one that does not directly 
map to anything in the server's filesystem). I know it's possible because 
packages like PageKit do it. I tried hacking through PageKit's code, but it 
didn't answer any questions.

My problem is that whenever Apache receives a URL for a virtual location, its 
default translation handler converts it into a directory index (index.html), 
which of course doesn't exist. The end result is a 403. Now I can write my 
own PerlTransHandler to intercept requests for my specific locations and 
pretend to translate them, but that's a pain and PageKit seems to work 
without doing that.

Does anybody have any insight they could offer? Thanks in advance. :)

-- 
Milo Hyson
CyberLife Labs, LLC



Re: Apache::Session

2002-02-24 Thread Milo Hyson

On Sunday 24 February 2002 02:43 am, Christoph Lange wrote:
> > The session hash is serialized/deserialized in its entirety using the
> > Storable module.
>
> Does this mean, that - after tying the session hash - it is of no
> importance (concerning the amount of time needed) whether I do
> %everything_from_session_hash  =  %session_hash;  # or
> $everything_from_session_hash{element1} = $session_hash{element1};
> I actually thought that the second way saves time since only one value of
> the hash (however big this may be) is extracted from the database.

There is no difference. Behind the scenes, the entire hash is serialized into 
a single scalar and stored in a single database field. In order to retrieve 
any part of the session, the scalar must be read from the database and 
de-serialized. The serialize/de-serialize steps are performed when you 
tie/un-tie the hash.

I found it helpful to take apart the various Apache::Session modules and see 
what makes them tick.

-- 
Milo Hyson
CyberLife Labs, LLC



Re: Apache::Session

2002-02-23 Thread Milo Hyson

On Saturday 23 February 2002 03:03 pm, Christoph Lange wrote:
> Hi,
>
> I guess that this is going to be another "what-a-bloody-beginner"-question
> but I hope somebody will be in a good mood and help me out.
>
> I am using Apache::Session with Postgresql. Unfortunately I had never
> worked with a huge amount of data before I started to program something
> like a (little) web application. I happily packed everything in the
> "session"(s-table) that might be of any use. It hit me hard that it takes a
> veeey long time to get all the stuff out of the "session"(s-table) each
> time the client sends another request. So I became a little more particular
> about what to store. My question referrs to how the extraction of data from
> the "session"(s-table) works. Ok, I have tied a %session and now need to
> get $session{this}->{is}->{an}->{example}. Will the session module always
> fetch the entire $session{this} or is there a way to get exactly the
> reference I want?

The session hash is serialized/deserialized in its entirety using the 
Storable module. If you have a large structure it's going to get the whole 
thing each time. Personally, I try to never store anything other than object 
IDs in the session. Not only does this reduce the session size but it helps 
to prevent synchronization problems.

-- 
Milo Hyson
CyberLife Labs, LLC



Re: When handlers misfire

2002-02-22 Thread Milo Hyson

On Thursday 21 February 2002 05:21 pm, Rick Myers wrote:
> On Feb 21, 2002 at 15:23:04 -0800, Milo Hyson wrote:
> > On Wednesday 20 February 2002 07:55 pm, Geoffrey Young wrote:
> > > > If the redirected request needs that session
> > > > data, there's a small chance it won't be there yet.
> > >
> > > have you seen this?  I don't recall this ever coming up before (which
> > > doesn't mean it can't happen :)
> >
> > Yes, I have seen it happen. Quite frequently in fact. My investigation
> > into the problem is how I discovered the cleanup handler wasn't doing its
> > job in time.
>
> Want to see something even more interesting? Let your
> debugging warnings mention the pid that caused the log
> entry and let it run a while on a production server. I see
> stuff like...

To add to all of this, since installing the fixup and cleanup/log handlers, 
I've noticed problems shutting down Apache. When running apachectl stop, I 
see numerous messages in the logs about the child processes not shutting down 
and having to be SIGKILLed. Yesterday however, I relocated my session 
management code into Apache::Registry. This seems to solve all my problems. 
In fact, it may even open a few additional doors :)

-- 
Milo Hyson
CyberLife Labs, LLC



Re: When handlers misfire

2002-02-21 Thread Milo Hyson

On Wednesday 20 February 2002 07:55 pm, Geoffrey Young wrote:
> > If the redirected request needs that session
> > data, there's a small chance it won't be there yet.
>
> have you seen this?  I don't recall this ever coming up before (which
> doesn't mean it can't happen :)

Yes, I have seen it happen. Quite frequently in fact. My investigation into 
the problem is how I discovered the cleanup handler wasn't doing its job in 
time.

> perhaps your post-content code in a PerlLogHandler instead of a
> PerlCleanupHandler might help if you are running into problems.  the
> browser isn't released from the current connection until logging is
> complete, so there wouldn't be the chance that a redirect would be
> processed before the session is created.

I moved the session cleanup phase to a PerlLogHandler and it seems to be 
working, except for one small issue. Request URIs for directories (i.e. no 
filename specified) don't seem to trigger the log handler. I put some 
warnings in the code to trace its execution. The following is a dump of my 
logs following a test run:

[Thu Feb 21 15:02:46 2002] [warn] SessionPrepare called for GET 
/iddb/target/index.pl
[Thu Feb 21 15:02:46 2002] [warn] SessionPrepare called for GET 
/iddb/target/index.pl
[Thu Feb 21 15:03:15 2002] [warn] SessionPrepare called for POST 
/iddb/target/login.pl
[Thu Feb 21 15:03:15 2002] [warn] SessionCleanup called for POST 
/iddb/target/login.pl
[Thu Feb 21 15:03:15 2002] [warn] SessionPrepare called for GET 
/iddb/target/list-portals.pl
[Thu Feb 21 15:03:16 2002] [warn] SessionCleanup called for GET 
/iddb/target/list-portals.pl
[Thu Feb 21 15:03:18 2002] [warn] SessionPrepare called for GET 
/iddb/target/index.pl
[Thu Feb 21 15:03:18 2002] [warn] SessionPrepare called for GET 
/iddb/target/index.pl
[Thu Feb 21 15:03:18 2002] [warn] SessionPrepare called for GET 
/iddb/target/list-portals.pl
[Thu Feb 21 15:03:19 2002] [warn] SessionCleanup called for GET 
/iddb/target/list-portals.pl
[Thu Feb 21 15:04:36 2002] [warn] SessionPrepare called for GET 
/iddb/target/index.pl
[Thu Feb 21 15:04:37 2002] [warn] SessionCleanup called for GET 
/iddb/target/index.pl
[Thu Feb 21 15:04:37 2002] [warn] SessionPrepare called for GET 
/iddb/target/list-portals.pl
[Thu Feb 21 15:04:38 2002] [warn] SessionCleanup called for GET 
/iddb/target/list-portals.pl

The first two lines are the browser's initial request for /iddb/target and 
the subsequent redirect by Apache to /iddb/target/ and ultimately translated 
to /iddb/target/index.pl. Notice that the fixup handler fires for both, but 
the log handler doesn't. In lines seven and eight, I re-issued the initial 
request and received the same result -- no log handler. However, in lines 11 
and 12, I manually entered /iddb/target/index.pl into the browser's address 
line. This time the log handler was called.

Bug or feature?

-- 
Milo Hyson
CyberLife Labs, LLC



When handlers misfire

2002-02-20 Thread Milo Hyson

I just ran into a problem with my PerlFixupHandler/PerlCleanupHandler based 
session manager (discussed earlier). It seems there's no guarantee that the 
cleanup handler will fire before the browser receives the response from the 
content handler. There's a niche case where a redirect will get to the 
browser and back to Apache before the cleanup handler has a chance to write 
the session to the database. If the redirected request needs that session 
data, there's a small chance it won't be there yet.

Is there any way I can guarantee (short of hacking Apache::Registry) that my 
post-content code will run before the browser gets a response? From where I 
sit, the hack job looks like the best option right now.

-- 
Milo Hyson
CyberLife Labs, LLC



Re: [OT-ish] Session refresh philosophy

2002-02-19 Thread Milo Hyson

On Tuesday 19 February 2002 02:55 pm, Perrin Harkins wrote:
> Incidentally, this is mostly the same thing as what Jeffrey Baker mentioned
> a few days ago about storing state entirely inside a cookie with a message
> digest.  The only difference is that by sticking it in a form element
> you're attaching it to a specific page.

That's not a bad idea. I guess if you're paranoid about snooping you could 
always encrypt the cookie.

-- 
Milo Hyson
CyberLife Labs, LLC



Re: Session refresh philosophy

2002-02-18 Thread Milo Hyson

On Monday 18 February 2002 07:29 pm, Rob Nagler wrote:
> I may be asking the wrong question: is there a need for sessions?
> This seems like a lot of work when, for most applications, sessions
> are unnecessary.

I don't see how they could be unnecessary for what we're doing. Then again, 
maybe I'm just approaching the problem incorrectly. If one is doing a 
shopping-cart-style application (whereby someone selects/configures multiple 
items before they're ultimately committed to a database) how else would you 
do it? There has to be some semi-persistent (i.e. inter-request) data where 
selections are stored before they're confirmed.

-- 
Milo Hyson
CyberLife Labs, LLC



Session refresh philosophy

2002-02-18 Thread Milo Hyson

Like my previous question on object caching, this one is potentially a matter 
of style as well. When it comes to implementing expirations on session data, 
I've encountered two schools of thought on when is best to refresh the 
timestamp/expiration. In that the general idea of expiration is to discard 
information that hasn't been accessed in a while, some feel that updating the 
timestamp is best done during both loading and storing. After all, both are 
considered accessing the data. However, taking into account the general 
pattern of HTTP request processing, I feel that updating only during storage 
is best, especially when using a database for persistence.

Suppose one has a SQL table for saving session data. When a request comes in, 
the session is loaded and its expiration is examined. Assuming the session is 
still valid, one could issue another statement to the database to refresh the 
session's expiration time. That's two database ops before the session is even 
used. If you count the one at the end for storing the session back in the 
database it's a total of three per request. My feeling is that if you're 
going to be writing the session back within (hopefully) a fraction of a 
second anyway, you might as well wait until then to refresh the time-out.

The project I'm working on requires that I design a custom application 
platform for current and future projects. My proposed solution to the session 
management problem is as follows:

1) A fix-up handler is called to extract the session ID from a cookie. 
Assuming a valid ID was found, the session is loaded, de-serialized and 
checked for expiration. If all is well, the a reference to the session is 
stored in pnotes for use by the application.

1a) If for some reason no session was found (e.g. no cookie) a new one is 
created and a new cookie is stuffed in the outgoing headers.

2) During content-generation, the application obtains the session reference 
from pnotes and uses it as necessary.

3) A clean-up handler is called to re-serialize the session and stick it back 
in persistent storage (updating the expiration in the process). The handler 
of course does nothing if the application destroyed the session in step 2.

I'm still fairly new to mod_perl and haven't fully taken apart all of the 
various application servers out there to see how they do it. I would still 
appreciate any feedback anyone may have on the above.

Thanks in advance.

-- 
Milo Hyson
CyberLife Labs, LLC



try/finally without catch

2002-02-08 Thread Milo Hyson

In Graham Barr's Error module, the documentation indicates that finally is 
called after either a successful try block or a catch handler, but says 
nothing about an unsuccessful try block without a catch handler. Does it 
handle this situation or am I forced to create dummy catch blocks whose only 
job is to re-throw the exception so that finally can execute?

-- 
Milo Hyson
CyberLife Labs, LLC



Re: performance coding project? (was: Re: When to cache)

2002-01-26 Thread Milo Hyson

On Saturday 26 January 2002 03:40 pm, Sam Tregar wrote:
> Think search engines.  Once you've figured out how to get your search
> database to fit in memory (or devised a cachin strategy to get the
> important parts there) you're essentially looking at a CPU-bound problem.
> These days the best solution is probably some judicious use of Inline::C.
> Back when I last tackled the problem I had to hike up mount XS to find my
> grail...

I agree. There are some situations that are just too complex for a DBMS to 
handle directly, at least in any sort of efficient fashion. However, 
depending on the load in those cases, Perrin's solution for eToys is probably 
a good approach (i.e. custom search software written in C/C++).

-- 
Milo Hyson
CyberLife Labs, LLC



When to cache

2002-01-23 Thread Milo Hyson

I'm interested to know what the opinions are of those on this list with 
regards to caching objects during database write operations. I've encountered 
different views and I'm not really sure what the best approach is.

Take a typical caching scenario: Data/objects are locally stored upon loading 
from a database to improve performance for subsequent requests. But when 
those objects change, what's the best method for refreshing the cache? There 
are two possible approaches (maybe more?):

1) The old cache entry is overwritten with the new.
2) The old cache entry is expired, thus forcing a database hit (and 
subsequent cache load) on the next request.

The first approach would tend to yield better performance. However there's no 
guarantee the data will ever be read. The cache could end up with a large 
amount of data that's never referenced. The second approach would probably 
allow for a smaller cache by ensuring that data is only cached on reads.

In the end, this probably boils down to application requirements. RAM and 
disk storage is so cheap these days that the first method is probably fine 
for most purposes. However I'm sure there are situations where resources are 
limited and the second is more effective. What does everyone think?

-- 
Milo Hyson
CyberLife Labs, LLC