Powered by Solr
I was intending to make an entry to the 'Powered by Solr' page, so I created a Wiki account and logged in. When I go to that page, it shows it as being 'immutable', which I take as meaning I can't edit it. Is there someone I can send the information to who can do the edit? Or perhaps there is some sort of trick to editing that page? Thanks for your help, and apologies in advance if this is a silly question... Terence
Re: Powered by Solr
Did you try hitting refresh on your browser after you logged in? Wow, I really should have known that...thank you for your patient reply, Yonik. Regards...Terence
RE: Selective Searches Based on User Identity
Yes, the ownerUid will likely be assigned once and never changed. But you still need it, in order to keep track of who has contributed which document. I've been going over some of the simpler query scenarios, and Solr is capable of handling them without having to resort to an external RDBMS. In order to limit documents to those which a given user owns, or those to which he has been granted access, the syntax fragment would be something like; ownerUid:ab2734 or grantedUid:ab2734 where abs2734 is the uid for the user doing the query. However, I'm less comfortable with more complex query scenarios, particularly if the concept of groups is eventually introduced, which is likely in my scenario. In the latter case, it may be necessary to use an external RDBMS. I'll plead ignorance of the 'ineluctable filter query' and will have to read up on that one. With respect to updates to rights, they are not likely to be that frequent, but when they are, they entire document will have to be reindexed rather than simply updating the grantedUid and/or deniedUid fields. I don't believe Solr supports the updating of individual fields, at least not yet. This may be another reason to eventually go to an external RDBMS. Thanks very much for your help! Terence -Original Message- From: Michael Ludwig Sent: May 13, 2009 05:27 To: solr-user@lucene.apache.org Subject: Re: Selective Searches Based on User Identity Terence Gannon schrieb: Paul -- thanks for the reply, I appreciate it. That's a very practical approach, and is worth taking a closer look at. Actually, taking your idea one step further, perhaps three fields; 1) ownerUid (uid of the document's owner) 2) grantedUid (uid of users who have been granted access), and 3) deniedUid (uid of users specifically denied access to the document). Grants might change quite a bit, the owner will likely remain the same. Wouldn't it be better to include only the owner in the document and store grants someplace else, like in an RDBMS or - if you don't want one - a lightweight embedded database like BDB? That way you could have your application tag an ineluctable filter query onto each and every user query, which would ensure to include only those documents in the results the owner of which has granted the user access. Considering that I'm a Solr/Lucene newbie, this approach might have a disadvantage that escapes me, which is why other people haven't made this particular suggestion. If so, I'd be happy to learn why this isn't preferable. Michael Ludwig
RE: Selective Searches Based on User Identity
Paul -- thanks for the reply, I appreciate it. That's a very practical approach, and is worth taking a closer look at. Actually, taking your idea one step further, perhaps three fields; 1) ownerUid (uid of the document's owner) 2) grantedUid (uid of users who have been granted access), and 3) deniedUid (uid of users specifically denied access to the document). These fields, coupled with some business rules around how they were populated should cover off all possibilities I think. Access to the Solr instance would have to be tightly controlled, but that's something that should be done anyway. You sure wouldn't want end users preparing their own XML and throwing it at Solr -- it would be pretty easy to figure out how to get around the access/denied fields and get at stuff the owner didn't intend. This approach mimics to some degree what is being done in the operating system, but it's still elegant and provides the level of control required. Anybody else have any thoughts in this regard? Has anybody implemented anything similar, and if so, how did it work? Thanks, and best regards... Terence
RE: Selective Searches Based on User Identity
Thanks for the tip. I went to their website (www.fastsearch.com), and got as far as the second line, top left 'A Microsoft Subsidiary'...at which point, hopes of it being another open source solution quickly faded. ;-) Seriously, though, it looks like an interesting product, but open source is a mandatory requirement for my particular application. But the fact they implemented this functionality would seem to support that it's a valid requirement, and I'll keep plugging away on it. Thank you very much for bringing FAST to my attention...I appreciate it! Best regards... Terence -Original Message- From: Matt Weber [mailto:m...@mattweber.org] Sent: May 12, 2009 14:06 To: solr-user@lucene.apache.org Subject: Re: Selective Searches Based on User Identity I also work with the FAST Enterprise Search engine and this is exactly how their Security Access Module works. They actually use a modified base-32 encoded value for indexing, but that is because they don't have the luxury of untokenized/un-processed String fields like Solr. Thanks, Matt Weber eSr Technologies http://www.esr-technologies.com
RE: Selective Searches Based on User Identity
In reply to both Matt and Jay's comments, the particular situation I'm dealing with is one where rights will change relatively little once they are established. Typically a document will be loaded and indexed, and a decision will be made on sharing that more-or-less immediately. It might change a couple of times after that, but that will be it. So early-binding seems like the better option. Thanks to both of you for your suggestions and help. Terence PS. I wish I had known about that conference...looks like it would have been very helpful to me right now! -Original Message- From: Matt Weber [mailto:m...@mattweber.org] Sent: May 12, 2009 14:41 To: solr-user@lucene.apache.org Subject: Re: Selective Searches Based on User Identity Here is a good presentation on search security from the Infonortics Search Conference that was held a few weeks ago. http://www.infonortics.com/searchengines/sh09/slides/kehoe.pdf The approach you are using is called early-binding. As Jay mentioned, one of the downsides is updating the documents each time you have an ACL change. You could use the late-binding approach that checks each result after the query but before you display to the user. I don't recommend this approach because it will strain your security infrastructure because you will need to check if the user can access each result. Good luck. Thanks, Matt Weber eSr Technologies http://www.esr-technologies.com
Selective Searches Based on User Identity
Can anybody point me in the direction of resources and/or projects regarding the following scenario; I have a community of users contributing content to a Solr index. By default, the user (A) who contributes a document owns it, and can see the document in their search results. The owner can then grant selective access to that document to other users. If another user (B) is granted access by A, then document shows up in B's search results, along with whatever B has contributed and any other documents to which B has been granted access. Conversely, if B is not granted access to the document, it does not show up in their search results. I'm comfortable building this logic myself, so long as I'm not repeating the work of others in this area. Thanks, in advance, for any advice or information. Terence
Improving Readability of Hit Highlighting
I'm indexing text from an OCR of an old document. Many words get read perfectly, but they're typically embedded in a lot of junk. I would like the hit highlighting to show only the 'good' words, in the order in which they appeared in the original document. Is it possible to use output of the filter classes as the text used in hit highlighting? Or do you have to all the text cleanup outside of Solr and present it with two fields to index, one with the original text, and one with the cleaned up text. The objective of the hit highlighting is to give the user a *sense* of the original context, even if it's not provided verbatim from the original document. Thanks in advance. TerryG
Re: Improving Readability of Hit Highlighting
To answer your questions specifically, here is an example of the raw OCR output; CONTRACTORINMPRIMENTAYIVE : mom Ale ACCEPT INFORMATIONON TOUR SHEET TO ea to which I would like to see; mom ale access tour sheet to in the hit highlight. My schema for this field is pretty much standard, as follows; tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ... filter class=solr.WordDelimiterFilterFactory ... filter class=solr.LowerCaseFilterFactory ... filter class=solr.EnglishPorterFilterFactory ... filter class=solr.RemoveDuplicatesTokenFilterFactory ... When I examine the effect of each of these with the Analyzer, it seems like if I could use the output after LowerCaseFilterFactory in the hit highlight, I would come close to achieving what I want. I'm not averse to doing the text cleanup external to Solr before the indexing, but only if it's *not* redundant to what the filter factories are going to do anyway. Thanks for your help! TerryG