Yonik, your reply was incredibly helpful. Thank you very much!

The "join" approach to document security you explained is somewhat
similar to what I called "Option 2" (ACL PostFilter) since permissions
are stored in each document, but it's much simpler in that I'm not
required to write, compile, and distribute my own QParserPlugin. In
addition, by using dynamic fields (for now anyway), I don't even have
to distribute a new schema.xml. It justs works! (Once you re-index.)
At least it seems to work. I'm declaring this a new option, Option 5.
:)

The crux of the solution is creating a new document "type" to join on,
a new "group" type. For me, this new "group" type sits along side some
other document "types" I had defined already (dataverses, datasets,
and files in my case). Each of my older types, my existing documents,
now get tagged with the id of one or more of the new "group"
documents. It's like saying, "This document can be seen by these
groups I'm tagging it with."

To make this more concrete, I thought I'd post some curl output
showing how I'm now tagging my existing "dataverse" documents with new
permissions such as "group_2" and "group_public" which represent
actual groups as well as what I'll call "User Private Groups" (UPG*)
which is one group per user with the user's name. (Unlike your example
where user "joe" is part of a group called "joe" I'm putting "user1"
in the name of the group such as "groups_user1". But that's still the
"joe" group that only joe is a part of.)

At runtime, I'll check to see which groups a user is part of and then
run one or more joins (separated by OR's) for each group. Anonymous
users only get to see documents tagged with the group called "public",
as you had illustrated. If you're part of a lot of groups, I guess
there will be a lot of OR's in the filter query.

Output from curl is below. Comments are welcome! (Any objections to
this approach?) Thanks again!

Phil

Exisiting "dataverse" documents, now tagged with various groups under
the "perms_ss" field, and two example joins, separated by an OR:

[pdurbin@localhost ~]$ curl -s --globoff
'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&sort=id+desc&q=*&fq=({!join+from=groups_s+to=perms_ss}id:group_public+OR+{!join+from=groups_s+to=perms_ss}id:group_user1)'
| jq '.response.docs[] | {id,perms_ss,dvtype}' | head -17
{
  "dvtype": "dataverses",
  "perms_ss": [
    "group_user1",
    "group_user5",
    "group_2"
  ],
  "id": "dataverse_9"
}
{
  "dvtype": "dataverses",
  "perms_ss": [
    "group_public",
    "group_2"
  ],
  "id": "dataverse_7"
}

New "groups" documents that are used in the join:

[pdurbin@localhost ~]$ curl -s
'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&sort=id+asc&q=id:group**'
| jq '.response.docs[] | {id,groups_s,dvtype}' | grep group_public -B7
-A6
{
  "dvtype": "groups",
  "groups_s": "group_4",
  "id": "group_4"
}
{
  "dvtype": "groups",
  "groups_s": "group_public",
  "id": "group_public"
}
{
  "dvtype": "groups",
  "groups_s": "group_user1",
  "id": "group_user1"
}

* User Private Groups (UPG) is what Red Hat calls them: "Red Hat
Enterprise Linux uses a user private group (UPG) scheme, which makes
UNIX groups easier to manage. A user private group is created whenever
a new user is added to the system. It has the same name as the user
for which it was created and that user is the only member of the user
private group." --
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/ch-Managing_Users_and_Groups.html#s2-users-groups-private-groups

On Tue, Mar 25, 2014 at 3:40 PM, Yonik Seeley <yo...@heliosearch.com> wrote:
> Depending on requirements, another option for simple security is to
> store the security info in the index and utilize a join.  This really
> only works when you have a single shard since joins aren't
> distributed.
>
> # the documents, with permissions
> id:doc1, perms:public,...
> id:doc2, perms:group1 group2 joe, ...
> id:doc3, perms:group3, ...
>
> # documents modeling users and what groups they belong to
> id:joe, groups:joe public  group3
> id:mark, groups:mark public group1 group2
>
> And then if joe does a query, you add a filter query like the following
> fq={!join from=groups to=perms v=id:joe}
>
> The user documents can either be in the same collection, or in a
> separate "core" as long as it's co-located in the same JVM (core
> container), and you can do a cross-core join.
>
> -Yonik
> http://heliosearch.org - solve Solr GC pauses with off-heap filters
> and fieldcache
>
>
> On Tue, Mar 25, 2014 at 3:06 PM, Philip Durbin
> <philip_dur...@harvard.edu> wrote:
>> I'm new to Solr and I'm looking for a document level security filter
>> solution. Anonymous users searching my application should be able to
>> find public data. Logged in users should be able to find public data
>> and private data they have access to.
>>
>> Earlier today I wrote about shards as a possible solution. I got a
>> great reply from Shalin Shekhar Mangar of LucidWorks explaining how to
>> achieve something technical but I'd like to back up a minute and
>> consider other solutions.
>>
>> For one thing, I'm concerned about the potential misuse of shards.
>> Judging from this wiki page, shards seem to be used primarily for
>> scalability rather than security (access control): "When an index
>> becomes too large to fit on a single system..." -
>> https://cwiki.apache.org/confluence/display/solr/Distributed+Search+with+Index+Sharding
>>
>> For consistency with longer writeup of mine on this topic[1], I'm
>> going to refer to the sharding solution as Option 4. Here's the full
>> list of options I'm aware of for document level security filtering:
>>
>> 1. Manifold CF (Connector Framework)
>>
>> http://manifoldcf.apache.org
>>
>> 2. ACL PostFilter (ACLs in each document)
>>
>> Specifically, I mean this wonderful writeup by Erik Hatcher from
>> LucidWorks: http://java.dzone.com/articles/custom-security-filtering-solr
>>
>> 3. Pass a (often long) list of IDs in query
>>
>> Representative question:
>> http://lucene.472066.n3.nabble.com/Solr-large-boolean-filter-td4070747.html
>>
>> 4. Sharding (public shard, private shards per user)
>>
>> My post from earlier today:
>> http://lucene.472066.n3.nabble.com/creating-shards-on-the-fly-in-a-single-Solr-instance-quot-shards-quot-query-parameter-td4126909.html
>>
>> I'm happy to hear opinions on any of these solutions or others I
>> haven't even considered!
>>
>> Thanks!
>>
>> Phil
>>
>> 1. My longer writeup of this topic:
>> https://trello.com/c/5z5PpR4r/50-design-solr-document-level-security-filter-solution
>>
>> --
>> Philip Durbin
>> Software Developer for http://thedata.org
>> http://www.iq.harvard.edu/people/philip-durbin



-- 
Philip Durbin
Software Developer for http://thedata.org
http://www.iq.harvard.edu/people/philip-durbin

Reply via email to