"So my question is, notwithstanding that this is not the "typical" way ManifoldCF works, can we use it in the way that I am describing. Is it malleable enough to work or is it designed to do something so different from what we need that it would be useless. I guess the key question is really, can we tell ManifoldCF to limit results to those visible to a specific user and would there be any performance or other unexpected downsides to doing that."
Hi Hank, There is nothing specific about the ManifoldCF *framework* that prevents you from doing what you suggest. But there are problems, as follows: (1) Most out-of-the-box repository connection types, including the SharePoint type, do not give you any ability to limit crawls to a specific user. Instead, because they are intended to support a very different security model, they fetch a document's access tokens, which are described by the book chapter I pointed you to. (2) If you modified the SharePoint repository connection type in the manner you suggest, you would still need to create a custom output connection type to drop the content into your per-user database instances. The alternative would be to use an appropriate out-of-the-box output connection type, if there is one, and have N jobs for N users. Hope that answers your question. Karl On Thu, Mar 19, 2015 at 2:15 PM, hank williams <[email protected]> wrote: > Thanks Karl. > > I will most certainly be reading the document you linked to in great > detail. It looks like stuff I need to know. > > That said, we have a given technology that we have developed and that we > will be using. It creates a separate index for each user. The technology > has vastly greater utility than just for sharepoint and Its been in > development for about six years . (in fact this sharepoint thing is a > recent add-on request.) > > So my question is, notwithstanding that this is not the "typical" way > ManifoldCF works, can we use it in the way that I am describing. Is it > malleable enough to work or is it designed to do something so different > from what we need that it would be useless. I guess the key question is > really, can we tell ManifoldCF to limit results to those visible to a > specific user and would there be any performance or other unexpected > downsides to doing that. > > Hank > > > On Thu, Mar 19, 2015 at 1:53 PM, Karl Wright <[email protected]> wrote: > >> Hi Hank, >> >> "Our project involves a database that has a private secure user space >> for each user. Our database is built on Lucene and indexes every object in >> the database. Each user presumably has some number of SharePoint sites that >> they have access to. We want to index each sharepoint object (file or >> sharepoint page) as we find it, for each user. The user then ends up with >> an index of just the objects that they have perrmissions for. But to do >> that we need to, for each user crawl all of the sharepoint sites that they >> have access to. Permissions to each sharepoint site are managed by K >> erberos." >> >> This is not the typical ManifoldCF model. In the typical case, there is >> ONE lucene search engine (not N), and any searches that take place apply >> security restrictions internally based on the user's security information, >> as obtained from the ManifoldCF authority service, which is in turn >> querying SharePoint. >> >> You can read more about the standard authorization setup here: >> >> >> https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs/MCFiA%20CH%2004.pdf >> >> Karl >> >> >> >> >> On Thu, Mar 19, 2015 at 1:44 PM, hank williams <[email protected]> wrote: >> >>> I am embarking on an effort for which ManifoldCF may be an appropriate >>> tool. I am a total noob, having just discovered this project and have a few >>> questions that I am hoping someone can answer so that I can begin to gain >>> some confidence about the way things work. Basically I am trying to make >>> sure I understand, at a top level, how ManifoldCF works. >>> >>> Our project involves a database that has a private secure user space for >>> each user. Our database is built on Lucene and indexes every object in the >>> database. Each user presumably has some number of SharePoint sites that >>> they have access to. We want to index each sharepoint object (file or >>> sharepoint page) as we find it, for each user. The user then ends up with >>> an index of just the objects that they have perrmissions for. But to do >>> that we need to, for each user crawl all of the sharepoint sites that they >>> have access to. Permissions to each sharepoint site are managed by K >>> erberos. >>> >>> So the questions are: >>> >>> a. Can I, with ManifoldCF take list of sharepoint sites and a list of >>> users and relevant Kerberos appropriate authentication tokens or keys (just >>> learning about Kerberos), and get back a list of indexable objects/URIs >>> (HTML, .docx, pptx, etc.)? >>> >>> b. Is this the right way to think about it? >>> >>> c. If so, is there any example code or documentation that would explain >>> how I do this? >>> >>> d. Does manifoldCF provide any information to help indicate whether the >>> given object has changed, or is that something we need to figure out by >>> manually comparing the old and new documents in our code? >>> >> >> >
