Re: document level security: indexing/searching techniques
Someone else was recently asking a similar question (or maybe it was you but worded differently :) ). Putting user level security at a document level seems like a recipe for pain. Solr/Lucene don't do frequent update well...and being highly optimized for query, I don't blame them. Is there any way to create a series of roles that you can apply to your documents? If the security level of the document isn't changing, just the user access to them, give the docs a role in the index, put your user/usergroup stuff in a DB or some other system and resolve your user into valid roles, then FilterQuery on role. -- View this message in context: http://lucene.472066.n3.nabble.com/document-level-security-indexing-searching-techniques-tp946528p946649.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: document level security: indexing/searching techniques
On Jul 6, 2010, at 8:27am, osocurious2 wrote: Someone else was recently asking a similar question (or maybe it was you but worded differently :) ). Putting user level security at a document level seems like a recipe for pain. Solr/Lucene don't do frequent update well...and being highly optimized for query, I don't blame them. Is there any way to create a series of roles that you can apply to your documents? If the security level of the document isn't changing, just the user access to them, give the docs a role in the index, put your user/usergroup stuff in a DB or some other system and resolve your user into valid roles, then FilterQuery on role. You're right, baking in too fine-grained a level of security information is a bad idea. As one example that worked pretty well for code search with Krugle, we set access control on a per project level using LDAP groups - ie each project had some number of groups that were granted access rights. Each file in the project would inherit the same list of groups. Then, when a user logs in they get authenticated via LDAP, and we have the set of groups they belong to being returned by the LDAP server. This then becomes a fairly well-bounded list of terms for an OR query against the acl-groups field in each file/project document. Just don't forget to set the boost to 0 for that portion of the query :) -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
Re: document level security: indexing/searching techniques
Yes, you don't want to hard code permissions into your index - it will give you headaches. You might want to have a look at SOLR 1872: https://issues.apache.org/jira/browse/SOLR-1872 . This patch provides doc level security through an external ACL mechanism (in this case, an XML file) controlling a filter query, This way, you don't need to change the schema - you can even use existing indexes, and you can change access control without affecting your stored data. HTH, Peter On Tue, Jul 6, 2010 at 5:16 PM, Ken Krugler kkrugler_li...@transpac.comwrote: On Jul 6, 2010, at 8:27am, osocurious2 wrote: Someone else was recently asking a similar question (or maybe it was you but worded differently :) ). Putting user level security at a document level seems like a recipe for pain. Solr/Lucene don't do frequent update well...and being highly optimized for query, I don't blame them. Is there any way to create a series of roles that you can apply to your documents? If the security level of the document isn't changing, just the user access to them, give the docs a role in the index, put your user/usergroup stuff in a DB or some other system and resolve your user into valid roles, then FilterQuery on role. You're right, baking in too fine-grained a level of security information is a bad idea. As one example that worked pretty well for code search with Krugle, we set access control on a per project level using LDAP groups - ie each project had some number of groups that were granted access rights. Each file in the project would inherit the same list of groups. Then, when a user logs in they get authenticated via LDAP, and we have the set of groups they belong to being returned by the LDAP server. This then becomes a fairly well-bounded list of terms for an OR query against the acl-groups field in each file/project document. Just don't forget to set the boost to 0 for that portion of the query :) -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
Re: document level security: indexing/searching techniques
What Ken describes is called 'role-based' security. Users have roles, and security items talk about roles, not users. http://en.wikipedia.org/wiki/Role-based_access_control On Tue, Jul 6, 2010 at 3:15 PM, Peter Sturge peter.stu...@gmail.com wrote: Yes, you don't want to hard code permissions into your index - it will give you headaches. You might want to have a look at SOLR 1872: https://issues.apache.org/jira/browse/SOLR-1872 . This patch provides doc level security through an external ACL mechanism (in this case, an XML file) controlling a filter query, This way, you don't need to change the schema - you can even use existing indexes, and you can change access control without affecting your stored data. HTH, Peter On Tue, Jul 6, 2010 at 5:16 PM, Ken Krugler kkrugler_li...@transpac.comwrote: On Jul 6, 2010, at 8:27am, osocurious2 wrote: Someone else was recently asking a similar question (or maybe it was you but worded differently :) ). Putting user level security at a document level seems like a recipe for pain. Solr/Lucene don't do frequent update well...and being highly optimized for query, I don't blame them. Is there any way to create a series of roles that you can apply to your documents? If the security level of the document isn't changing, just the user access to them, give the docs a role in the index, put your user/usergroup stuff in a DB or some other system and resolve your user into valid roles, then FilterQuery on role. You're right, baking in too fine-grained a level of security information is a bad idea. As one example that worked pretty well for code search with Krugle, we set access control on a per project level using LDAP groups - ie each project had some number of groups that were granted access rights. Each file in the project would inherit the same list of groups. Then, when a user logs in they get authenticated via LDAP, and we have the set of groups they belong to being returned by the LDAP server. This then becomes a fairly well-bounded list of terms for an OR query against the acl-groups field in each file/project document. Just don't forget to set the boost to 0 for that portion of the query :) -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g -- Lance Norskog goks...@gmail.com
Re: document level security: indexing/searching techniques
You could implement a good solution with the underlying Lucene ParallelReader http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/index/ParallelReader.html Keep the 100 search fields - 'static' info - in one index, the permissions info in another index that gets updated when the permissions change. Does SOLR expose this kind of functionality? -Glen Newton http://zzzoot.blogspot.com/ http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html On 7 July 2010 00:38, RL rl.subscri...@gmail.com wrote: I've a question about indexing/searching techniques in relation to document level security. In planning a system that has, let's say, about 1million search documents with about 100 search fields each. Most of them unstored to keep the index size low, because some of them can contain some kilobytes and some of them several hundred kilobytes. Two of these search fields are for permission checking, where i keep the explicitely allowed and explicitely disallowed users and usergroups. (usergroups can be in a hierarchical structure with permission inheritance) So when a user searches in the system, his user id, and ids of usergroup memberships are added as a filter query in my application logic before the query is sent to solr. So far so good for the searching part. But the problem is, that the permissions can be changed by administrators of that system, requiring to re-index the two permission search fields. first idea: Partial updates of index entries is not possible, so i need to fetch all the 1million documents from a database to do a re-indexing just because some permissions changed. The fetching process is rather expensive and requires more then 14hours. I am sure that this can be optimized of course, but i would rather try to avoid a whole re-indexing of all content. second idea: Another idea would be to store just the permissions in one small and fast to update index and all the other stuff in the other huge and not so often updated index. But i didn't find any possibilities to combine these two indices in one query. Is that even possible? Does somebody have experience with these topics or give advice how to solve that case properly? Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/document-level-security-indexing-searching-techniques-tp946528p946528.html Sent from the Solr - User mailing list archive at Nabble.com. -- -