This is *super* helpful. I think perhaps I am seeing how to handle this.

Regarding #2, since our database is proprietary, there would be no existing
output connection type so in any case we would need to create our own.

But #1 is clearly an issue. My first thought is that the answer would be to
just read everything (not limited by permissions) and then to use a custom
output connector to "place" copies in the right accounts. If output
connectors have access to the access tokens then I am presuming a custom
output connector could look and say, "oh this document is accessible to
these specific people", but is that a reasonable assumption?


On Thu, Mar 19, 2015 at 2:26 PM, Karl Wright <[email protected]> wrote:

> "So my question is, notwithstanding that this is not the "typical" way
> ManifoldCF works, can we use it in the way that I am describing. Is it
> malleable enough to work or is it designed to do something so different
> from what we need that it would be useless. I guess the key question is
> really, can we tell ManifoldCF to limit results to those visible to a
> specific user and would there be any performance or other unexpected
> downsides to doing that."
>
> Hi Hank,
>
> There is nothing specific about the ManifoldCF *framework* that prevents
> you from doing what you suggest.  But there are problems, as follows:
>
> (1) Most out-of-the-box repository connection types, including the
> SharePoint type, do not give you any ability to limit crawls to a specific
> user.  Instead, because they are intended to support a very different
> security model, they fetch a document's access tokens, which are described
> by the book chapter I pointed you to.
> (2) If you modified the SharePoint repository connection type in the
> manner you suggest, you would still need to create a custom output
> connection type to drop the content into your per-user database instances.
> The alternative would be to use an appropriate out-of-the-box output
> connection type, if there is one, and have N jobs for N users.
>
> Hope that answers your question.
>
> Karl
>
>
>
> On Thu, Mar 19, 2015 at 2:15 PM, hank williams <[email protected]> wrote:
>
>> Thanks Karl.
>>
>> I will most certainly be reading the document you linked to in great
>> detail. It looks like stuff I need to know.
>>
>> That said, we have a given technology that we have developed and that we
>> will be using. It creates a separate index for each user. The technology
>> has vastly greater utility than just for sharepoint and Its been in
>> development for about six years . (in fact this sharepoint thing is a
>> recent add-on request.)
>>
>> So my question is, notwithstanding that this is not the "typical" way
>> ManifoldCF works, can we use it in the way that I am describing. Is it
>> malleable enough to work or is it designed to do something so different
>> from what we need that it would be useless. I guess the key question is
>> really, can we tell ManifoldCF to limit results to those visible to a
>> specific user and would there be any performance or other unexpected
>> downsides to doing that.
>>
>> Hank
>>
>>
>> On Thu, Mar 19, 2015 at 1:53 PM, Karl Wright <[email protected]> wrote:
>>
>>> Hi Hank,
>>>
>>> "Our project involves a database that has a private secure user space
>>> for each user. Our database is built on Lucene and indexes every object in
>>> the database. Each user presumably has some number of SharePoint sites that
>>> they have access to. We want to index each sharepoint object (file or
>>> sharepoint page) as we find it, for each user. The user then ends up with
>>> an index of just the objects that they have perrmissions for. But to do
>>> that we need to, for each user crawl all of the sharepoint sites that they
>>> have access to. Permissions to each sharepoint site are managed by K
>>> erberos."
>>>
>>> This is not the typical ManifoldCF model.  In the typical case, there is
>>> ONE lucene search engine (not N), and any searches that take place apply
>>> security restrictions internally based on the user's security information,
>>> as obtained from the ManifoldCF authority service, which is in turn
>>> querying SharePoint.
>>>
>>> You can read more about the standard authorization setup here:
>>>
>>>
>>> https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs/MCFiA%20CH%2004.pdf
>>>
>>> Karl
>>>
>>>
>>>
>>>
>>> On Thu, Mar 19, 2015 at 1:44 PM, hank williams <[email protected]>
>>> wrote:
>>>
>>>> I am embarking on an effort for which ManifoldCF may  be an appropriate
>>>> tool. I am a total noob, having just discovered this project and have a few
>>>> questions that I am hoping someone can answer so that I can begin to gain
>>>> some confidence about the way things work. Basically I am trying to make
>>>> sure I understand, at a top level, how ManifoldCF works.
>>>>
>>>> Our project involves a database that has a private secure user space
>>>> for each user. Our database is built on Lucene and indexes every object in
>>>> the database. Each user presumably has some number of SharePoint sites that
>>>> they have access to. We want to index each sharepoint object (file or
>>>> sharepoint page) as we find it, for each user. The user then ends up with
>>>> an index of just the objects that they have perrmissions for. But to do
>>>> that we need to, for each user crawl all of the sharepoint sites that they
>>>> have access to. Permissions to each sharepoint site are managed by K
>>>> erberos.
>>>>
>>>> So the questions are:
>>>>
>>>> a. Can I, with ManifoldCF take list of sharepoint sites and a list of
>>>> users and relevant Kerberos appropriate authentication tokens or keys (just
>>>> learning about Kerberos), and get back a list of indexable objects/URIs
>>>> (HTML, .docx, pptx, etc.)?
>>>>
>>>> b. Is this the right way to think about it?
>>>>
>>>> c. If so, is there any example code or documentation that would explain
>>>> how I do this?
>>>>
>>>> d. Does manifoldCF provide any information to help indicate whether the
>>>> given object has changed, or is that something we need to figure out by
>>>> manually comparing the old and new documents in our code?
>>>>
>>>
>>>
>>
>

Reply via email to