This is *super* helpful. I think perhaps I am seeing how to handle this. Regarding #2, since our database is proprietary, there would be no existing output connection type so in any case we would need to create our own.
But #1 is clearly an issue. My first thought is that the answer would be to just read everything (not limited by permissions) and then to use a custom output connector to "place" copies in the right accounts. If output connectors have access to the access tokens then I am presuming a custom output connector could look and say, "oh this document is accessible to these specific people", but is that a reasonable assumption? On Thu, Mar 19, 2015 at 2:26 PM, Karl Wright <[email protected]> wrote: > "So my question is, notwithstanding that this is not the "typical" way > ManifoldCF works, can we use it in the way that I am describing. Is it > malleable enough to work or is it designed to do something so different > from what we need that it would be useless. I guess the key question is > really, can we tell ManifoldCF to limit results to those visible to a > specific user and would there be any performance or other unexpected > downsides to doing that." > > Hi Hank, > > There is nothing specific about the ManifoldCF *framework* that prevents > you from doing what you suggest. But there are problems, as follows: > > (1) Most out-of-the-box repository connection types, including the > SharePoint type, do not give you any ability to limit crawls to a specific > user. Instead, because they are intended to support a very different > security model, they fetch a document's access tokens, which are described > by the book chapter I pointed you to. > (2) If you modified the SharePoint repository connection type in the > manner you suggest, you would still need to create a custom output > connection type to drop the content into your per-user database instances. > The alternative would be to use an appropriate out-of-the-box output > connection type, if there is one, and have N jobs for N users. > > Hope that answers your question. > > Karl > > > > On Thu, Mar 19, 2015 at 2:15 PM, hank williams <[email protected]> wrote: > >> Thanks Karl. >> >> I will most certainly be reading the document you linked to in great >> detail. It looks like stuff I need to know. >> >> That said, we have a given technology that we have developed and that we >> will be using. It creates a separate index for each user. The technology >> has vastly greater utility than just for sharepoint and Its been in >> development for about six years . (in fact this sharepoint thing is a >> recent add-on request.) >> >> So my question is, notwithstanding that this is not the "typical" way >> ManifoldCF works, can we use it in the way that I am describing. Is it >> malleable enough to work or is it designed to do something so different >> from what we need that it would be useless. I guess the key question is >> really, can we tell ManifoldCF to limit results to those visible to a >> specific user and would there be any performance or other unexpected >> downsides to doing that. >> >> Hank >> >> >> On Thu, Mar 19, 2015 at 1:53 PM, Karl Wright <[email protected]> wrote: >> >>> Hi Hank, >>> >>> "Our project involves a database that has a private secure user space >>> for each user. Our database is built on Lucene and indexes every object in >>> the database. Each user presumably has some number of SharePoint sites that >>> they have access to. We want to index each sharepoint object (file or >>> sharepoint page) as we find it, for each user. The user then ends up with >>> an index of just the objects that they have perrmissions for. But to do >>> that we need to, for each user crawl all of the sharepoint sites that they >>> have access to. Permissions to each sharepoint site are managed by K >>> erberos." >>> >>> This is not the typical ManifoldCF model. In the typical case, there is >>> ONE lucene search engine (not N), and any searches that take place apply >>> security restrictions internally based on the user's security information, >>> as obtained from the ManifoldCF authority service, which is in turn >>> querying SharePoint. >>> >>> You can read more about the standard authorization setup here: >>> >>> >>> https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs/MCFiA%20CH%2004.pdf >>> >>> Karl >>> >>> >>> >>> >>> On Thu, Mar 19, 2015 at 1:44 PM, hank williams <[email protected]> >>> wrote: >>> >>>> I am embarking on an effort for which ManifoldCF may be an appropriate >>>> tool. I am a total noob, having just discovered this project and have a few >>>> questions that I am hoping someone can answer so that I can begin to gain >>>> some confidence about the way things work. Basically I am trying to make >>>> sure I understand, at a top level, how ManifoldCF works. >>>> >>>> Our project involves a database that has a private secure user space >>>> for each user. Our database is built on Lucene and indexes every object in >>>> the database. Each user presumably has some number of SharePoint sites that >>>> they have access to. We want to index each sharepoint object (file or >>>> sharepoint page) as we find it, for each user. The user then ends up with >>>> an index of just the objects that they have perrmissions for. But to do >>>> that we need to, for each user crawl all of the sharepoint sites that they >>>> have access to. Permissions to each sharepoint site are managed by K >>>> erberos. >>>> >>>> So the questions are: >>>> >>>> a. Can I, with ManifoldCF take list of sharepoint sites and a list of >>>> users and relevant Kerberos appropriate authentication tokens or keys (just >>>> learning about Kerberos), and get back a list of indexable objects/URIs >>>> (HTML, .docx, pptx, etc.)? >>>> >>>> b. Is this the right way to think about it? >>>> >>>> c. If so, is there any example code or documentation that would explain >>>> how I do this? >>>> >>>> d. Does manifoldCF provide any information to help indicate whether the >>>> given object has changed, or is that something we need to figure out by >>>> manually comparing the old and new documents in our code? >>>> >>> >>> >> >
