Hi there, I recently wired up Nutch to perform HTTP authentication to crawl one of our intranet sites. Given that access to the raw content requires authentication, I thought that it made sense to users authenticate when trying to access the Nutch web app. I implemented this by configuring the Nutch web app to require authentication for access all of its resources, and then I wired up a JAAS module that performs the authentication.
Naively, I assumed that the user would be authenticated once, and then the session cookie would be set. However, it turns out that the web app is completely sessionless. This makes absolute sense for searching for non-protected resoruces. For my scenario, though, lack of a session seems to cause an authentication to occur for every HTTP request made to the server. I can certainly modify the web app to meet my needs, but might anyone have any suggestions which would help me avoid maintaining my own web-app? I suppose I could cut down on the requests by restricting the need for authentication for resources that end with ".do", but would that be enough to guard all information exposed be the web application? Perhaps a filter that initializes the session might do the trick? Any thoughts would be greatly appreciated! Best regards, Mark DeSpain
