So as usual when I post a question to this list I end up finding the solution myself, after much pain, ... sigh.
The solution is nice because it will solve the Lucene problem and the problem of external robots, like google, trying to index the site. Basically I treat the user-agent name of the robot as an user id. (This allows me to control which robots index the site and what they see which has other nice side effects that I won't get into here cause they are peculiar to my situation). In any case, the robot is authenicated in the normal way and information is stored in the session. I then arrange for all URL's that the robot will crawl to be URL encodded with the session id. This is done using a xslt transform on <a href="someurl"> elements. > -----Original Message----- > From: Sal Mangano [mailto:[EMAIL PROTECTED] > Sent: Thursday, August 05, 2004 4:59 PM > To: [EMAIL PROTECTED] > Subject: RE: Preventing session from timing out during search indexing > > > Okay, further investigation shows that the session is not > timing out. It is that the indexer that is crawling the site > is not attached to the session. I still am not sure how to > fix but have some ideas. Would appreciate help just the same, PLEASE! > > > -----Original Message----- > > From: Sal Mangano [mailto:[EMAIL PROTECTED] > > Sent: Thursday, August 05, 2004 3:45 PM > > To: [EMAIL PROTECTED] > > Subject: Preventing session from timing out during search indexing > > > > > > I am using Cocoon 2.1.5 and Tomcat 4.1.3 > > > > My site is constructed such that a user must be logged in to > > access old content. A protected pipeline is set up using > > <map:match type="regexp-session" .../> to control access. > > This all works fine. > > > > However, when it comes time to build my Lucene search index, > > trouble begins. On my dev box the search index can take 1 > > hour to build. Since the index involves gaining access to > > these protected pipelines the session must stay valid until > > the indexing is done. I use an xsp to kick off the search > > indexing and the relevant part looks like: > > > > <!--Make sure session does not expire before indexing > > is finished --> > > <xsp-session:set-max-inactive-interval interval="-1"/> > > <xsp-session:set-attribute name="role">USER > > PUBLISHER</xsp-session:set-attribute> > > createIndex(baseURL, create ); > > > > As I tail the access.log I can see the index building process > > is going along fine on its merry way for a time. The all of a > > sudden I see all accesses being redirected to a URL > > restricted.html which is exactly what will happen when there > > is no session or the session timed out. > > > > Why is this not fixed by <xsp-session:set-max-inactive-interval > > interval="-1"/> or <xsp-session:set-max-inactive-interval > > interval="8000"/>? > > > > Any hints or alternate strategies would be appreciated. > > > > -Sal > > > > --------------------------------------------------------- > > Sal Mangano > > Into Technology Inc. > > www.into-technology.com > > > > Use XSLT? Try the XSLT Cookbook > > http://www.oreilly.com/catalog/xsltckbk/ > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
