I am embarking on an effort for which ManifoldCF may be an appropriate tool. I am a total noob, having just discovered this project and have a few questions that I am hoping someone can answer so that I can begin to gain some confidence about the way things work. Basically I am trying to make sure I understand, at a top level, how ManifoldCF works.
Our project involves a database that has a private secure user space for each user. Our database is built on Lucene and indexes every object in the database. Each user presumably has some number of SharePoint sites that they have access to. We want to index each sharepoint object (file or sharepoint page) as we find it, for each user. The user then ends up with an index of just the objects that they have perrmissions for. But to do that we need to, for each user crawl all of the sharepoint sites that they have access to. Permissions to each sharepoint site are managed by Kerberos. So the questions are: a. Can I, with ManifoldCF take list of sharepoint sites and a list of users and relevant Kerberos appropriate authentication tokens or keys (just learning about Kerberos), and get back a list of indexable objects/URIs (HTML, .docx, pptx, etc.)? b. Is this the right way to think about it? c. If so, is there any example code or documentation that would explain how I do this? d. Does manifoldCF provide any information to help indicate whether the given object has changed, or is that something we need to figure out by manually comparing the old and new documents in our code?
