I'm interested in crawling multiple shared folders (among other
things) on a corporate LAN.
It is a LAN of MS clients with Active Directory managed accounts.

The users routinely access the files based on ntfs-level (and
sharing?) permissions.

Idealy, I'd like to set up a central server (probably linux, but any
*n*x would do) where I'd mount all the shared folders.
I'd then set up apache so that the files are accessible via http and,
more importantly, webdav. I imagine apache could use mod_dav, mod_auth
and possibly one or two other modules to regulate access priviledges -
I could very well be completely wrong here.
Finally, I'd like to set up nutch to crawl the shared documents
through the web server, so that the stored links are valid in the
whole LAN. Nutch would therefore require absolute access to all
documents, but the documents would be served via a web server who
checks user identities and access rights.

Nutch users who've tackled the access rights problem themselves would
save me a world of time, effort and trouble with a couple of pointers
on how to go about the whole security issue.
If the setup I described is the worst possible way to go about it, I'd
appreciate a notice saying so and elaborating why. :)

TIA,
t.n.a.

Reply via email to