Re: Public/Private data in Solr :: Metadata or ?

2016-10-19 Thread Hrishikesh Gadre
As part of Cloudera Search, we have integrated with Apache Sentry for
document level authorization. Currently we are using custom search
component to implement filtering. Please refer to this blog post for
details,
http://blog.cloudera.com/blog/2014/07/new-in-cdh-5-1-document-level-security-for-cloudera-search/

I am currently working on a Sentry based plugin implementation which can be
hooked in the Solr authorization framework. Currently Solr authorization
framework doesn't implement document level security. I filed SOLR-9578
 to add the relevant doc
level security support in Solr.

The main drawback of custom search component based mechanism is that it
requires a special solrconfig.xml file (which is using these custom search
components). On the other hand, once Solr provides hooks to implement doc
level security as part of authorization framework, then this restriction
will go away.

If you have any ideas (or concerns) with this feature, please feel free to
comment on the jira.

Thanks
Hrishikesh

On Wed, Oct 19, 2016 at 7:48 AM, Shawn Heisey  wrote:

> On 10/18/2016 3:00 PM, John Bickerstaff wrote:
> > How (or is it even wise) to "segregate data" in Solr so that some data
> > can be seen by some users and some data not be seen?
>
> IMHO, security like this isn't really Solr's job ... but with the right
> data in the index, the system that DOES handle the security can include
> a filter with each user's query to restrict them to only the data they
> are allowed to see.  There are many ways to put data in the index for
> efficient use by a filter.  The simplest would be a boolean field with a
> name like isPublic or isPrivate, where true and false are mapped as
> necessary to public and private.
>
> Naturally, the users must not be able to reach Solr directly ... they
> must be restricted to the software that connects to Solr on their
> behalf.  Blocking end users from direct network access to Solr is a good
> idea even if there are no other security needs.
>
> There are more comprehensive solutions available, as you will notice
> from other replies, but the idea of simple filtering, controlled by your
> application, should work.
>
> Thanks,
> Shawn
>
>


Re: Public/Private data in Solr :: Metadata or ?

2016-10-19 Thread John Bickerstaff
Thanks Erick - also very helpful.

On Wed, Oct 19, 2016 at 1:24 PM, Erick Erickson 
wrote:

> And for hairy ACL processing, consider a post-filter. It's custom code
> that only evaluates a document _after_ it has made it through the
> primary query and any "lower cost" filters. See:
> http://yonik.com/advanced-filter-caching-in-solr/.
>
> NOTE: this isn't the thing I would do first, it's much more efficient
> to implement some of the suggestions above. Any time you can trade off
> index-time work for query-time work, it's almost always better to do
> the work up-front during queries
>
> Best,
> Erick
>
> On Wed, Oct 19, 2016 at 12:07 PM, John Bickerstaff
>  wrote:
> > Thank you both!  Very helpful.
> >
> > On Wed, Oct 19, 2016 at 8:48 AM, Shawn Heisey 
> wrote:
> >
> >> On 10/18/2016 3:00 PM, John Bickerstaff wrote:
> >> > How (or is it even wise) to "segregate data" in Solr so that some data
> >> > can be seen by some users and some data not be seen?
> >>
> >> IMHO, security like this isn't really Solr's job ... but with the right
> >> data in the index, the system that DOES handle the security can include
> >> a filter with each user's query to restrict them to only the data they
> >> are allowed to see.  There are many ways to put data in the index for
> >> efficient use by a filter.  The simplest would be a boolean field with a
> >> name like isPublic or isPrivate, where true and false are mapped as
> >> necessary to public and private.
> >>
> >> Naturally, the users must not be able to reach Solr directly ... they
> >> must be restricted to the software that connects to Solr on their
> >> behalf.  Blocking end users from direct network access to Solr is a good
> >> idea even if there are no other security needs.
> >>
> >> There are more comprehensive solutions available, as you will notice
> >> from other replies, but the idea of simple filtering, controlled by your
> >> application, should work.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>


Re: Public/Private data in Solr :: Metadata or ?

2016-10-19 Thread Erick Erickson
And for hairy ACL processing, consider a post-filter. It's custom code
that only evaluates a document _after_ it has made it through the
primary query and any "lower cost" filters. See:
http://yonik.com/advanced-filter-caching-in-solr/.

NOTE: this isn't the thing I would do first, it's much more efficient
to implement some of the suggestions above. Any time you can trade off
index-time work for query-time work, it's almost always better to do
the work up-front during queries

Best,
Erick

On Wed, Oct 19, 2016 at 12:07 PM, John Bickerstaff
 wrote:
> Thank you both!  Very helpful.
>
> On Wed, Oct 19, 2016 at 8:48 AM, Shawn Heisey  wrote:
>
>> On 10/18/2016 3:00 PM, John Bickerstaff wrote:
>> > How (or is it even wise) to "segregate data" in Solr so that some data
>> > can be seen by some users and some data not be seen?
>>
>> IMHO, security like this isn't really Solr's job ... but with the right
>> data in the index, the system that DOES handle the security can include
>> a filter with each user's query to restrict them to only the data they
>> are allowed to see.  There are many ways to put data in the index for
>> efficient use by a filter.  The simplest would be a boolean field with a
>> name like isPublic or isPrivate, where true and false are mapped as
>> necessary to public and private.
>>
>> Naturally, the users must not be able to reach Solr directly ... they
>> must be restricted to the software that connects to Solr on their
>> behalf.  Blocking end users from direct network access to Solr is a good
>> idea even if there are no other security needs.
>>
>> There are more comprehensive solutions available, as you will notice
>> from other replies, but the idea of simple filtering, controlled by your
>> application, should work.
>>
>> Thanks,
>> Shawn
>>
>>


Re: Public/Private data in Solr :: Metadata or ?

2016-10-19 Thread John Bickerstaff
Thank you both!  Very helpful.

On Wed, Oct 19, 2016 at 8:48 AM, Shawn Heisey  wrote:

> On 10/18/2016 3:00 PM, John Bickerstaff wrote:
> > How (or is it even wise) to "segregate data" in Solr so that some data
> > can be seen by some users and some data not be seen?
>
> IMHO, security like this isn't really Solr's job ... but with the right
> data in the index, the system that DOES handle the security can include
> a filter with each user's query to restrict them to only the data they
> are allowed to see.  There are many ways to put data in the index for
> efficient use by a filter.  The simplest would be a boolean field with a
> name like isPublic or isPrivate, where true and false are mapped as
> necessary to public and private.
>
> Naturally, the users must not be able to reach Solr directly ... they
> must be restricted to the software that connects to Solr on their
> behalf.  Blocking end users from direct network access to Solr is a good
> idea even if there are no other security needs.
>
> There are more comprehensive solutions available, as you will notice
> from other replies, but the idea of simple filtering, controlled by your
> application, should work.
>
> Thanks,
> Shawn
>
>


Re: Public/Private data in Solr :: Metadata or ?

2016-10-19 Thread Shawn Heisey
On 10/18/2016 3:00 PM, John Bickerstaff wrote:
> How (or is it even wise) to "segregate data" in Solr so that some data
> can be seen by some users and some data not be seen? 

IMHO, security like this isn't really Solr's job ... but with the right
data in the index, the system that DOES handle the security can include
a filter with each user's query to restrict them to only the data they
are allowed to see.  There are many ways to put data in the index for
efficient use by a filter.  The simplest would be a boolean field with a
name like isPublic or isPrivate, where true and false are mapped as
necessary to public and private.

Naturally, the users must not be able to reach Solr directly ... they
must be restricted to the software that connects to Solr on their
behalf.  Blocking end users from direct network access to Solr is a good
idea even if there are no other security needs.

There are more comprehensive solutions available, as you will notice
from other replies, but the idea of simple filtering, controlled by your
application, should work.

Thanks,
Shawn



Re: Public/Private data in Solr :: Metadata or ?

2016-10-19 Thread Jan Høydahl
In practice there shoud not be much of a delay, but if you change the ACL 
permission on a top-level folder with 10 million docs beneath,
it will take some time before all those docs are reindexed. But if you instead 
give your friend read access to a new “group” which 
already have access to the docs, the change is immediate.

I suppose ManifoldCF could start using DocValues for the ACL info and update 
those atomically much faster than re-indexing the content of every document. 
Anyone know if that would be feasible?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 19. okt. 2016 kl. 00.30 skrev Markus Jelsma <markus.jel...@openindex.io>:
> 
> ManifoldCF can do this really flexible, with Filenet or Sharepoint, or both, 
> i don't remember that well. This means a variety of users can have changing 
> privileges  at any time. The backend determines visibility, ManifoldCF just 
> asks how visible it should be.
> 
> This also means you need those backends and ManifoldCF. If broad document and 
> users permissions are required (and you have those backends), this is a very 
> viable option.
> 
> 
> 
> -Original message-
>> From:John Bickerstaff <j...@johnbickerstaff.com>
>> Sent: Wednesday 19th October 2016 0:14
>> To: solr-user@lucene.apache.org
>> Subject: Re: Public/Private data in Solr :: Metadata or ?
>> 
>> Thanks Jan --
>> 
>> I did a quick scan on the wiki and here:
>> http://www.slideshare.net/lucenerevolution/wright-nokia-manifoldcfeurocon-2011
>> and couldn't find the answer to the following question in the 5 or 10
>> minutes I spent looking.  Admittedly I'm being lazy and hoping you have
>> enough experience with the project to answer easily...
>> 
>> Do you know if ManifoldCF helps with a use case where the security token
>> needs to be changed arbitrarily and a re-index of the collection is not
>> practical?  Or is ManifoldCF an index-time only kind of thing?
>> 
>> 
>> Use Case:  User A changes "record A" from private to public so a friend
>> (User B) can see it.  User B logs in and expects to see what User A changed
>> to public a few minutes earlier.
>> 
>> The security token on "record A" would need to be changed immediately, and
>> that change would have to occur in Solr - yes?
>> 
>> 
>> 
>> On Tue, Oct 18, 2016 at 3:32 PM, Jan Høydahl <jan@cominvent.com> wrote:
>> 
>>> https://wiki.apache.org/solr/SolrSecurity#Document_Level_Security <
>>> https://wiki.apache.org/solr/SolrSecurity#Document_Level_Security>
>>> 
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> 
>>>> 18. okt. 2016 kl. 23.00 skrev John Bickerstaff <j...@johnbickerstaff.com
>>>> :
>>>> 
>>>> I have a question that I suspect I'll need to answer very soon in my
>>>> current position.
>>>> 
>>>> How (or is it even wise) to "segregate data" in Solr so that some data
>>> can
>>>> be seen by some users and some data not be seen?
>>>> 
>>>> Taking the case of "public / private" as a (hopefully) simple, binary
>>>> example...
>>>> 
>>>> Let's imagine I have a data set that can be seen by a user.  Some of that
>>>> data can be seen ONLY by the user (this would be the private data) and
>>> some
>>>> of it can be seen by others (assume the user gave permission for this in
>>>> some way)
>>>> 
>>>> What is a best practice for handling this type of situation?  I can see
>>>> putting metadata in Solr of course, but the instant I do that, I create
>>> the
>>>> obligation to keep it updated (Document-level CRUD?) and I start using
>>> Solr
>>>> more like a DB than a search engine.
>>>> 
>>>> (Assume the user can change this public/private setting on any one piece
>>> of
>>>> "their" data at any time).
>>>> 
>>>> Of course, I can also see some kind of post-results massaging of data to
>>>> remove private data based on ID's which are stored in a database or
>>> similar
>>>> datastore...
>>>> 
>>>> How have others solved this and is there a consensus on whether to keep
>>> it
>>>> out of Solr, or how best to handle it in Solr?
>>>> 
>>>> Are there clever implementations of "secondary" collections in Solr for
>>>> this purpose?
>>>> 
>>>> Any advice / hard-won experience is greatly appreciated...
>>> 
>>> 
>> 



RE: Public/Private data in Solr :: Metadata or ?

2016-10-18 Thread Markus Jelsma
ManifoldCF can do this really flexible, with Filenet or Sharepoint, or both, i 
don't remember that well. This means a variety of users can have changing 
privileges  at any time. The backend determines visibility, ManifoldCF just 
asks how visible it should be.

This also means you need those backends and ManifoldCF. If broad document and 
users permissions are required (and you have those backends), this is a very 
viable option.

 
 
-Original message-
> From:John Bickerstaff <j...@johnbickerstaff.com>
> Sent: Wednesday 19th October 2016 0:14
> To: solr-user@lucene.apache.org
> Subject: Re: Public/Private data in Solr :: Metadata or ?
> 
> Thanks Jan --
> 
> I did a quick scan on the wiki and here:
> http://www.slideshare.net/lucenerevolution/wright-nokia-manifoldcfeurocon-2011
> and couldn't find the answer to the following question in the 5 or 10
> minutes I spent looking.  Admittedly I'm being lazy and hoping you have
> enough experience with the project to answer easily...
> 
> Do you know if ManifoldCF helps with a use case where the security token
> needs to be changed arbitrarily and a re-index of the collection is not
> practical?  Or is ManifoldCF an index-time only kind of thing?
> 
> 
> Use Case:  User A changes "record A" from private to public so a friend
> (User B) can see it.  User B logs in and expects to see what User A changed
> to public a few minutes earlier.
> 
> The security token on "record A" would need to be changed immediately, and
> that change would have to occur in Solr - yes?
> 
> 
> 
> On Tue, Oct 18, 2016 at 3:32 PM, Jan Høydahl <jan@cominvent.com> wrote:
> 
> > https://wiki.apache.org/solr/SolrSecurity#Document_Level_Security <
> > https://wiki.apache.org/solr/SolrSecurity#Document_Level_Security>
> >
> > --
> > Jan Høydahl, search solution architect
> > Cominvent AS - www.cominvent.com
> >
> > > 18. okt. 2016 kl. 23.00 skrev John Bickerstaff <j...@johnbickerstaff.com
> > >:
> > >
> > > I have a question that I suspect I'll need to answer very soon in my
> > > current position.
> > >
> > > How (or is it even wise) to "segregate data" in Solr so that some data
> > can
> > > be seen by some users and some data not be seen?
> > >
> > > Taking the case of "public / private" as a (hopefully) simple, binary
> > > example...
> > >
> > > Let's imagine I have a data set that can be seen by a user.  Some of that
> > > data can be seen ONLY by the user (this would be the private data) and
> > some
> > > of it can be seen by others (assume the user gave permission for this in
> > > some way)
> > >
> > > What is a best practice for handling this type of situation?  I can see
> > > putting metadata in Solr of course, but the instant I do that, I create
> > the
> > > obligation to keep it updated (Document-level CRUD?) and I start using
> > Solr
> > > more like a DB than a search engine.
> > >
> > > (Assume the user can change this public/private setting on any one piece
> > of
> > > "their" data at any time).
> > >
> > > Of course, I can also see some kind of post-results massaging of data to
> > > remove private data based on ID's which are stored in a database or
> > similar
> > > datastore...
> > >
> > > How have others solved this and is there a consensus on whether to keep
> > it
> > > out of Solr, or how best to handle it in Solr?
> > >
> > > Are there clever implementations of "secondary" collections in Solr for
> > > this purpose?
> > >
> > > Any advice / hard-won experience is greatly appreciated...
> >
> >
> 


RE: Public/Private data in Solr :: Metadata or ?

2016-10-18 Thread Markus Jelsma
The key is static indeed, just a subscription key. Under the hood it translates 
to a function query, which can vary. In our simple case it is really a key that 
translates to fq=host:(host1 host2 ... hostX). A simple backend sends this data 
to nginx every few minutes.

Again, just simple visibility. Nothing fancy. It works well for some cases.

 
 
-Original message-
> From:John Bickerstaff <j...@johnbickerstaff.com>
> Sent: Wednesday 19th October 2016 0:10
> To: solr-user@lucene.apache.org
> Subject: Re: Public/Private data in Solr :: Metadata or ?
> 
> Thanks Markus,
> 
> In your case that client's key is fairly static, yes?  It doesn't change at
> any time, but tends to live on the data more or less permanently?
> 
> On Tue, Oct 18, 2016 at 4:07 PM, Markus Jelsma <markus.jel...@openindex.io>
> wrote:
> 
> > In case you're not up for Doug or Jan's anwers; we have relied on HTTP
> > proxies (nginx) to solve the problem of restriction for over 6 years. Very
> > easy if visibility is your only problem. Of course, the update handlers are
> > hidden (we perform indexing for clients with crawlers) so we don't expose
> > anything update related.
> >
> > For us, it's is just simple translating a client's key to a filter query
> > equivalent.
> >
> > There are many answers depending on what you need.
> >
> > M.
> >
> >
> >
> > -Original message-
> > > From:John Bickerstaff <j...@johnbickerstaff.com>
> > > Sent: Tuesday 18th October 2016 23:00
> > > To: solr-user@lucene.apache.org
> > > Subject: Public/Private data in Solr :: Metadata or ?
> > >
> > > I have a question that I suspect I'll need to answer very soon in my
> > > current position.
> > >
> > > How (or is it even wise) to "segregate data" in Solr so that some data
> > can
> > > be seen by some users and some data not be seen?
> > >
> > > Taking the case of "public / private" as a (hopefully) simple, binary
> > > example...
> > >
> > > Let's imagine I have a data set that can be seen by a user.  Some of that
> > > data can be seen ONLY by the user (this would be the private data) and
> > some
> > > of it can be seen by others (assume the user gave permission for this in
> > > some way)
> > >
> > > What is a best practice for handling this type of situation?  I can see
> > > putting metadata in Solr of course, but the instant I do that, I create
> > the
> > > obligation to keep it updated (Document-level CRUD?) and I start using
> > Solr
> > > more like a DB than a search engine.
> > >
> > > (Assume the user can change this public/private setting on any one piece
> > of
> > > "their" data at any time).
> > >
> > > Of course, I can also see some kind of post-results massaging of data to
> > > remove private data based on ID's which are stored in a database or
> > similar
> > > datastore...
> > >
> > > How have others solved this and is there a consensus on whether to keep
> > it
> > > out of Solr, or how best to handle it in Solr?
> > >
> > > Are there clever implementations of "secondary" collections in Solr for
> > > this purpose?
> > >
> > > Any advice / hard-won experience is greatly appreciated...
> > >
> >
> 


Re: Public/Private data in Solr :: Metadata or ?

2016-10-18 Thread John Bickerstaff
Thanks Jan --

I did a quick scan on the wiki and here:
http://www.slideshare.net/lucenerevolution/wright-nokia-manifoldcfeurocon-2011
and couldn't find the answer to the following question in the 5 or 10
minutes I spent looking.  Admittedly I'm being lazy and hoping you have
enough experience with the project to answer easily...

Do you know if ManifoldCF helps with a use case where the security token
needs to be changed arbitrarily and a re-index of the collection is not
practical?  Or is ManifoldCF an index-time only kind of thing?


Use Case:  User A changes "record A" from private to public so a friend
(User B) can see it.  User B logs in and expects to see what User A changed
to public a few minutes earlier.

The security token on "record A" would need to be changed immediately, and
that change would have to occur in Solr - yes?



On Tue, Oct 18, 2016 at 3:32 PM, Jan Høydahl  wrote:

> https://wiki.apache.org/solr/SolrSecurity#Document_Level_Security <
> https://wiki.apache.org/solr/SolrSecurity#Document_Level_Security>
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 18. okt. 2016 kl. 23.00 skrev John Bickerstaff  >:
> >
> > I have a question that I suspect I'll need to answer very soon in my
> > current position.
> >
> > How (or is it even wise) to "segregate data" in Solr so that some data
> can
> > be seen by some users and some data not be seen?
> >
> > Taking the case of "public / private" as a (hopefully) simple, binary
> > example...
> >
> > Let's imagine I have a data set that can be seen by a user.  Some of that
> > data can be seen ONLY by the user (this would be the private data) and
> some
> > of it can be seen by others (assume the user gave permission for this in
> > some way)
> >
> > What is a best practice for handling this type of situation?  I can see
> > putting metadata in Solr of course, but the instant I do that, I create
> the
> > obligation to keep it updated (Document-level CRUD?) and I start using
> Solr
> > more like a DB than a search engine.
> >
> > (Assume the user can change this public/private setting on any one piece
> of
> > "their" data at any time).
> >
> > Of course, I can also see some kind of post-results massaging of data to
> > remove private data based on ID's which are stored in a database or
> similar
> > datastore...
> >
> > How have others solved this and is there a consensus on whether to keep
> it
> > out of Solr, or how best to handle it in Solr?
> >
> > Are there clever implementations of "secondary" collections in Solr for
> > this purpose?
> >
> > Any advice / hard-won experience is greatly appreciated...
>
>


Re: Public/Private data in Solr :: Metadata or ?

2016-10-18 Thread John Bickerstaff
Thanks Markus,

In your case that client's key is fairly static, yes?  It doesn't change at
any time, but tends to live on the data more or less permanently?

On Tue, Oct 18, 2016 at 4:07 PM, Markus Jelsma 
wrote:

> In case you're not up for Doug or Jan's anwers; we have relied on HTTP
> proxies (nginx) to solve the problem of restriction for over 6 years. Very
> easy if visibility is your only problem. Of course, the update handlers are
> hidden (we perform indexing for clients with crawlers) so we don't expose
> anything update related.
>
> For us, it's is just simple translating a client's key to a filter query
> equivalent.
>
> There are many answers depending on what you need.
>
> M.
>
>
>
> -Original message-
> > From:John Bickerstaff 
> > Sent: Tuesday 18th October 2016 23:00
> > To: solr-user@lucene.apache.org
> > Subject: Public/Private data in Solr :: Metadata or ?
> >
> > I have a question that I suspect I'll need to answer very soon in my
> > current position.
> >
> > How (or is it even wise) to "segregate data" in Solr so that some data
> can
> > be seen by some users and some data not be seen?
> >
> > Taking the case of "public / private" as a (hopefully) simple, binary
> > example...
> >
> > Let's imagine I have a data set that can be seen by a user.  Some of that
> > data can be seen ONLY by the user (this would be the private data) and
> some
> > of it can be seen by others (assume the user gave permission for this in
> > some way)
> >
> > What is a best practice for handling this type of situation?  I can see
> > putting metadata in Solr of course, but the instant I do that, I create
> the
> > obligation to keep it updated (Document-level CRUD?) and I start using
> Solr
> > more like a DB than a search engine.
> >
> > (Assume the user can change this public/private setting on any one piece
> of
> > "their" data at any time).
> >
> > Of course, I can also see some kind of post-results massaging of data to
> > remove private data based on ID's which are stored in a database or
> similar
> > datastore...
> >
> > How have others solved this and is there a consensus on whether to keep
> it
> > out of Solr, or how best to handle it in Solr?
> >
> > Are there clever implementations of "secondary" collections in Solr for
> > this purpose?
> >
> > Any advice / hard-won experience is greatly appreciated...
> >
>


RE: Public/Private data in Solr :: Metadata or ?

2016-10-18 Thread Markus Jelsma
In case you're not up for Doug or Jan's anwers; we have relied on HTTP proxies 
(nginx) to solve the problem of restriction for over 6 years. Very easy if 
visibility is your only problem. Of course, the update handlers are hidden (we 
perform indexing for clients with crawlers) so we don't expose anything update 
related.

For us, it's is just simple translating a client's key to a filter query 
equivalent.

There are many answers depending on what you need.

M.

 
 
-Original message-
> From:John Bickerstaff 
> Sent: Tuesday 18th October 2016 23:00
> To: solr-user@lucene.apache.org
> Subject: Public/Private data in Solr :: Metadata or ?
> 
> I have a question that I suspect I'll need to answer very soon in my
> current position.
> 
> How (or is it even wise) to "segregate data" in Solr so that some data can
> be seen by some users and some data not be seen?
> 
> Taking the case of "public / private" as a (hopefully) simple, binary
> example...
> 
> Let's imagine I have a data set that can be seen by a user.  Some of that
> data can be seen ONLY by the user (this would be the private data) and some
> of it can be seen by others (assume the user gave permission for this in
> some way)
> 
> What is a best practice for handling this type of situation?  I can see
> putting metadata in Solr of course, but the instant I do that, I create the
> obligation to keep it updated (Document-level CRUD?) and I start using Solr
> more like a DB than a search engine.
> 
> (Assume the user can change this public/private setting on any one piece of
> "their" data at any time).
> 
> Of course, I can also see some kind of post-results massaging of data to
> remove private data based on ID's which are stored in a database or similar
> datastore...
> 
> How have others solved this and is there a consensus on whether to keep it
> out of Solr, or how best to handle it in Solr?
> 
> Are there clever implementations of "secondary" collections in Solr for
> this purpose?
> 
> Any advice / hard-won experience is greatly appreciated...
> 


Re: Public/Private data in Solr :: Metadata or ?

2016-10-18 Thread Jan Høydahl
https://wiki.apache.org/solr/SolrSecurity#Document_Level_Security 


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 18. okt. 2016 kl. 23.00 skrev John Bickerstaff :
> 
> I have a question that I suspect I'll need to answer very soon in my
> current position.
> 
> How (or is it even wise) to "segregate data" in Solr so that some data can
> be seen by some users and some data not be seen?
> 
> Taking the case of "public / private" as a (hopefully) simple, binary
> example...
> 
> Let's imagine I have a data set that can be seen by a user.  Some of that
> data can be seen ONLY by the user (this would be the private data) and some
> of it can be seen by others (assume the user gave permission for this in
> some way)
> 
> What is a best practice for handling this type of situation?  I can see
> putting metadata in Solr of course, but the instant I do that, I create the
> obligation to keep it updated (Document-level CRUD?) and I start using Solr
> more like a DB than a search engine.
> 
> (Assume the user can change this public/private setting on any one piece of
> "their" data at any time).
> 
> Of course, I can also see some kind of post-results massaging of data to
> remove private data based on ID's which are stored in a database or similar
> datastore...
> 
> How have others solved this and is there a consensus on whether to keep it
> out of Solr, or how best to handle it in Solr?
> 
> Are there clever implementations of "secondary" collections in Solr for
> this purpose?
> 
> Any advice / hard-won experience is greatly appreciated...



Re: Public/Private data in Solr :: Metadata or ?

2016-10-18 Thread Doug Turnbull
You might want to talk to Kevin Waters or look at some of the work being
done with the graph plugin. It's being used to model permissions with Solr.
It's a bit of normalization within Solr whereby you could localize updates
to a users shared-with document. Kevin can probably talk more intelligently
than I can about it.

-Doug
On Tue, Oct 18, 2016 at 5:00 PM John Bickerstaff 
wrote:

> I have a question that I suspect I'll need to answer very soon in my
> current position.
>
> How (or is it even wise) to "segregate data" in Solr so that some data can
> be seen by some users and some data not be seen?
>
> Taking the case of "public / private" as a (hopefully) simple, binary
> example...
>
> Let's imagine I have a data set that can be seen by a user.  Some of that
> data can be seen ONLY by the user (this would be the private data) and some
> of it can be seen by others (assume the user gave permission for this in
> some way)
>
> What is a best practice for handling this type of situation?  I can see
> putting metadata in Solr of course, but the instant I do that, I create the
> obligation to keep it updated (Document-level CRUD?) and I start using Solr
> more like a DB than a search engine.
>
> (Assume the user can change this public/private setting on any one piece of
> "their" data at any time).
>
> Of course, I can also see some kind of post-results massaging of data to
> remove private data based on ID's which are stored in a database or similar
> datastore...
>
> How have others solved this and is there a consensus on whether to keep it
> out of Solr, or how best to handle it in Solr?
>
> Are there clever implementations of "secondary" collections in Solr for
> this purpose?
>
> Any advice / hard-won experience is greatly appreciated...
>