RE: indexing documents (or pieces of a document) by access controls

2007-06-14 Thread Ard Schrijvers
Hello,


 When I had those kind of problems (less complex) with lucene, 
 the only 
 idea was to filter from the front-end, according to the ACL policy. 
 Lucene docs and fields weren't protected, but tagged. Searching was 
 always applied with a field audience, with hierarchical values like
 public, reserved, protected, secret, so that a public 
 document has 
 the secret value also, to be found with a 
 audience:secret, according 
 to the rights of the user who searchs. For the fields, the 
 not allowed 
 ones for some users where striped.

Yes I know this is a possibility...but we happen to want our authorisation 
facetted based. I am attacking the problem with keeping derived data from 
lucene in memory all translated into some byte/int values. The hardest part is 
keeping the derived data in sink with lucene *and* the different jackrabbit 
users (some have changes in there session but not yet saved their data)

Anyway, I can do facetted authorisation + counting in less than 20 ms for 
1.000.000 documents (normal pc) so hopefully I can succeed. I must admit OTH, 
that I did not find some sort of ingenious algorithm, but merely depend on the 
speed of the processor: doubling the number of documents means doubling the 
response time and needed memory (though 1.000.000 doc fitted in 25 Mb, so 
40.000.000 in a Gb...that is fine by me) 

 
 May be you can have a look to the xmldb Exist ? The search engine, 
 xquery based, is not focused on the same goals as lucene, but I can 
 promise you that all queries will never return results from documents 
 you are not allowed to read.

I did not look at it, but my feeling is that it is not fast enough,

Regards Ard

 
 
 -- 
 Frédéric Glorieux
 École nationale des chartes
 direction des nouvelles technologies et de l'informatique
 


RE: indexing documents (or pieces of a document) by access controls

2007-06-13 Thread Ard Schrijvers
Hello,


 Given the requirement to break down a document into separately 
 controlled pieces, I'd create a servlet that fronts the Solr 
 servlet and handles this conversion. I could think of ways to do it 
 using Solr, but they feel like unnatural acts.
 
 As a general comment on ACLs, one relatively easy way to handle this 
 is via group ids that you use to restrict the query. Each document 
 has a groupid with a list of group ids that are authorized to access 
 it. Each user query is converted into a (query) AND (groupid:xx OR 
 groupid:yy), where xx/yy (and so on) are the groups that the user 
 belongs to.

With all do respect, I really think the problem is largely underestimated here, 
and is far more complex then these suggestions...unless we are talking about 
100.000 documents, couple of users, and updating ones a day. If you want 
millions of documents, facetted authorized navigation including counting and 
every second a new indexed document which should be reflected in the result 
instantly and changing authorisationsthe problem isn't relatively easy to 
solve anymore :-) 

Regards Ard

 
 -- Ken
 -- 
 Ken Krugler
 Krugle, Inc.
 +1 530-210-6378
 If you can't find it, you can't fix it
 


RE: indexing documents (or pieces of a document) by access controls

2007-06-13 Thread Ard Schrijvers
Hello,

 Hi
 
 And about the fields, if they are/aren't going to be present on the
 responses based on the user group, you can do it in many 
 different ways
 (using XML transformation to remove the undesirable fields, 
 implementing
 your own RequestHandler able to process your group 
 information, filtering
 the data and showing only what should be shown to the user, ...)

So suppose, you want to see 10 documents, but on average you are authorized to 
see 1 in 100 docs. Then on average, you need to fetch 100 docs to find 10 
results...100 XML transformationsthat will be slow. And I left out the fact 
that you still do not know the number of pages that user is allowed to see, the 
counting if you want facetted navigation, etc etc

Regards Ard

 
 Regards,
 Daniel
 
 
 On 12/6/07 16:14, Ken Krugler [EMAIL PROTECTED] wrote:
 
  Hi all,
  
  Can anyone give me some advice on breaking a document up 
 and indexing it
  by access control lists.  What we have are xml documents that are
  transformed based on the user viewing it.  Some users 
 might see all of
  the document, while other may see a few fields, and yet others see
  nothing at all.  The access control lists may be a role 
 the user belongs
  to, it may be a list of groups, or even a combination of the two.
  
  I can transform the xml to the plain text that I want to 
 index, and key
  it off of the acls and then pass along a list of acls that the user
  issuing a query belongs to when searching.  But I guess 
 I'm not really
  sure how to do this the best way.
  
  Anyone have any thoughts?
  
  Given the requirement to break down a document into separately
  controlled pieces, I'd create a servlet that fronts the Solr
  servlet and handles this conversion. I could think of ways to do it
  using Solr, but they feel like unnatural acts.
  
  As a general comment on ACLs, one relatively easy way to handle this
  is via group ids that you use to restrict the query. Each document
  has a groupid with a list of group ids that are authorized to access
  it. Each user query is converted into a (query) AND (groupid:xx OR
  groupid:yy), where xx/yy (and so on) are the groups that the user
  belongs to.
  
  -- Ken
 
 
 http://www.bbc.co.uk/
 This e-mail (and any attachments) is confidential and may 
 contain personal views which are not the views of the BBC 
 unless specifically stated.
 If you have received it in error, please delete it from your system.
 Do not use, copy or disclose the information in any way nor 
 act in reliance on it and notify the sender immediately.
 Please note that the BBC monitors e-mails sent or received.
 Further communication will signify your consent to this.
   
 


Re: indexing documents (or pieces of a document) by access controls

2007-06-13 Thread Frédéric Glorieux



Hello,

 With all do respect, I really think the problem is largely 
underestimated here, and is far more complex then these 
suggestions...unless we are talking about 100.000 documents, couple of 
users, and updating ones a day. If you want millions of documents, 
facetted authorized navigation including counting and every second a new 
indexed document which should be reflected in the result instantly and 
changing authorisationsthe problem isn't relatively easy to solve 
anymore :-)


When I had those kind of problems (less complex) with lucene, the only 
idea was to filter from the front-end, according to the ACL policy. 
Lucene docs and fields weren't protected, but tagged. Searching was 
always applied with a field audience, with hierarchical values like
public, reserved, protected, secret, so that a public document has 
the secret value also, to be found with a audience:secret, according 
to the rights of the user who searchs. For the fields, the not allowed 
ones for some users where striped.


May be you can have a look to the xmldb Exist ? The search engine, 
xquery based, is not focused on the same goals as lucene, but I can 
promise you that all queries will never return results from documents 
you are not allowed to read.



--
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique


RE: indexing documents (or pieces of a document) by access controls

2007-06-12 Thread Ard Schrijvers
Hello Nate,

IMHO, you will not be able to do this in solr unless you accept pretty hard 
constraints on your ACLs (I will get back to this in a moment). IMO, it is not 
possible to index documents along with ACLs. ACLs can be very fine grained, and 
the thing you describe, ACL specific parts of a documentwell, I wouldn't 
know how you would index this. (imagine you change the ACL for a specific user. 
How do you know what to re-index and what not. Suppose you add a user? I really 
do not think it is possible based on fine grained ACLs). 

You also should realize you are trying to find an answer to an extremely 
complex problem: authorisation in an index (I am trying to develop facetted 
navigation in combination with authorisation in a lucene index in jackrabbit, 
but I think this is not the place to discuss it)

So, in your case, if you want to use solr and some way of ACLs, I think 
basically you can only manage this if:

1) you ACLs are some sort of paths in a hiearchical based structure, where you 
index the hierarchical structure along with the content. Then when quering you 
have to include the folders that user is allowed to see

2) you need to keep bitset for each user which documents are allowed (but, you 
have even ACLs inside documents). Also, keeping bitsets up2date for many users 
is almost impossible, because 
- lucene document ids possible change after merging segments
- updating documents might mean updating many many bitsets if you have many 
users

For these reasons, I do not think you can achieve with solar what you want, 
unless you are going to work with something like: updating the index and ACL 
bitsets once a day.

Regards Ard


Can anyone give me some advice on breaking a document up and indexing it
by access control lists.  What we have are xml documents that are
transformed based on the user viewing it.  Some users might see all of
the document, while other may see a few fields, and yet others see
nothing at all.  The access control lists may be a role the user belongs
to, it may be a list of groups, or even a combination of the two.

I can transform the xml to the plain text that I want to index, and key
it off of the acls and then pass along a list of acls that the user
issuing a query belongs to when searching.  But I guess I'm not really
sure how to do this the best way.

Anyone have any thoughts?

Thanks!
Nate






RE: indexing documents (or pieces of a document) by access controls

2007-06-12 Thread Ard Schrijvers
Excuse me, I meant solr ofcourse :-) 

 For these reasons, I do not think you can achieve with solar 


Re: indexing documents (or pieces of a document) by access controls

2007-06-12 Thread Ken Krugler

Hi all,

Can anyone give me some advice on breaking a document up and indexing it
by access control lists.  What we have are xml documents that are
transformed based on the user viewing it.  Some users might see all of
the document, while other may see a few fields, and yet others see
nothing at all.  The access control lists may be a role the user belongs
to, it may be a list of groups, or even a combination of the two.

I can transform the xml to the plain text that I want to index, and key
it off of the acls and then pass along a list of acls that the user
issuing a query belongs to when searching.  But I guess I'm not really
sure how to do this the best way.

Anyone have any thoughts?


Given the requirement to break down a document into separately 
controlled pieces, I'd create a servlet that fronts the Solr 
servlet and handles this conversion. I could think of ways to do it 
using Solr, but they feel like unnatural acts.


As a general comment on ACLs, one relatively easy way to handle this 
is via group ids that you use to restrict the query. Each document 
has a groupid with a list of group ids that are authorized to access 
it. Each user query is converted into a (query) AND (groupid:xx OR 
groupid:yy), where xx/yy (and so on) are the groups that the user 
belongs to.


-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
If you can't find it, you can't fix it


Re: indexing documents (or pieces of a document) by access controls

2007-06-12 Thread Daniel Alheiros
Hi

And about the fields, if they are/aren't going to be present on the
responses based on the user group, you can do it in many different ways
(using XML transformation to remove the undesirable fields, implementing
your own RequestHandler able to process your group information, filtering
the data and showing only what should be shown to the user, ...)

Regards,
Daniel


On 12/6/07 16:14, Ken Krugler [EMAIL PROTECTED] wrote:

 Hi all,
 
 Can anyone give me some advice on breaking a document up and indexing it
 by access control lists.  What we have are xml documents that are
 transformed based on the user viewing it.  Some users might see all of
 the document, while other may see a few fields, and yet others see
 nothing at all.  The access control lists may be a role the user belongs
 to, it may be a list of groups, or even a combination of the two.
 
 I can transform the xml to the plain text that I want to index, and key
 it off of the acls and then pass along a list of acls that the user
 issuing a query belongs to when searching.  But I guess I'm not really
 sure how to do this the best way.
 
 Anyone have any thoughts?
 
 Given the requirement to break down a document into separately
 controlled pieces, I'd create a servlet that fronts the Solr
 servlet and handles this conversion. I could think of ways to do it
 using Solr, but they feel like unnatural acts.
 
 As a general comment on ACLs, one relatively easy way to handle this
 is via group ids that you use to restrict the query. Each document
 has a groupid with a list of group ids that are authorized to access
 it. Each user query is converted into a (query) AND (groupid:xx OR
 groupid:yy), where xx/yy (and so on) are the groups that the user
 belongs to.
 
 -- Ken


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.