-----Original Message----- From: Michael Oliver [mailto:[EMAIL PROTECTED] Sent: Montag, 19. Januar 2004 15:13 To: Slide Developers Mailing List Subject: Re: Proposal : index store - Lucene
On Mon, 2004-01-19 at 06:32, Stefano Mazzocchi wrote: > On 18 Jan 2004, at 22:12, Christophe wrote: > > > Stefano Mazzocchi wrote: > > > >> > >> > >>> If you store your properties in one store (eg. DB) and used index > >>> store engine for content search, I expected to have some performance > >>> issues when you search on prop and content. > >> > >> > >> hmmm, not sure I follow you, can you elaborate on this more? it would > >> be very appreciated. > >> > > How do you make a query that used criteria on properties and full text > > search? > > eh, good question :-) Using DASL this is streightforward: a query that gets all documents of content type "text/plain" bytes and containing the string "jakarta" could be posed as: <?xml version="1.0" encoding="UTF-8"?> <searchrequest xmlns:D="DAV:" xmlns:xsv="http://namespaces.softwareag.com/tamino/webdav">> <D:basicsearch> <D:select> <D:allprop/> </D:select> <D:from> <D:scope> <D:href>mycoll</D:href> </D:scope> </D:from> <D:where> <D:and> <D:eq> <D:prop> <D:getcontenttype/> </D:prop> <D:literal>text/plain</D:literal> </D:eq> <D:contains>jakarta</D:contains> </D:and> </D:where> </D:basicsearch> </searchrequest> A Full text search on properties can be achieved with the operator <LIKE> Regards, Martin > > > If the properties/metadata are in a DB and content is tokenized into a > > index engine like Lucene. First, you need to select rows from DB > > tables and makes a second query into the index store to query on the > > content itself. > > For this kind of scenario (search on prop AND full text search), I > > expect only one query via Lucene will be faster. Lucene can store > > properties that will not be tokenized. Anyway, it is not a ideal > > situation because properties have to be duplicate into 2 differents > > stores. So, I don't know what will be the best solution ! > > I think we are attacking the problem from the wrong angle: first we > need to collect usecases, then we need to find a way to make the > usecase possible. > > I personally wouldn't know how to make use of a query against full text > *and* properties. This is because such a query looks weird to me: > full-text is the least structure possible (get me everything but I > don't know where) while properties tent to be very much structured > (last modified time, author, and so on). > > There is a decades long discussion on what is data and what is metadata > and I don't want to touch that with a stick, but I think that if you > need to do full-text search on your metadata there is something wrong. Stefano with all due respect, there is nothing wrong with a full-text search on metadata because metadata in this case can be any properties of any of the resources in the repository and that meta data can be free form text. consider a search query like doctype="memo" and description contains "Fire Stefano" and contents contains "January" doctype and description are properties with string values that would be indexed and matched with the same index as the contents. Everybody doesn't use the Database Stores, some actually preter the XML Stores so an index of the XML should be full text, yes? > But this is my very personal vision, of course, and I would like to see > what other usecases or scenarios others can come up with before stating > where to go. > > >>> Anyway, Do you have some idea to optimize the current search service > >>> ? > >> > >> I havn't looked into this yet (I'm still lagging behind on some other > >> issues with my project and I havn't attacked this part yet). > >> > >> The idea is to use an RDBMS as much as possible on all content that > >> can be turned relational without major issues (and normally metadata > >> fits this category). As for full-text search, I agree that there is > >> no way to beat an engine like lucene. > >> > > Agree ! I understand your point of view, the best way to query on > > properties is certainly the classic select statment but if you need an > > index/search engine to for full-text search, I don't know. > > I personally had this vision before: DASL allows you to select the > search language. We already provide the DASL basic-search, nothing > stops us from coming up with an entirely new lucene-influenced > full-text language that works only on the files contents. > > So, you do different queries depending on how you want to treat the > content. > > > Furthermore, like Erik explains in a previous mail, you can write some > > filter to apply security rules. So, in one query makes in only one > > store , you can filters on props, content and security rules. > > Can you do that without storing properties into the search engine ? > > I'm curious :-) For clarity indexing properties as they go into the store isn't the same as storing properties into the search engine/index. In other words the index of the properties and content just needs access to the data as it is being stored and doesn't impact the stores beyond the call, and that can be minimized with an indexing queue that can be done asynchronously. > > You could, I think, but it would be tremendously slow compared. > > > It should be interesting to compare in more detail both solution, > > makes performance tests, ... > > > >>> Why not to support both situation : either inder the prop or not ? > >> > >> You mean with a global configuration or more granularely? > >> > > > > Still thinking on that :-) The idea is to use the domain.xml file to > > define how to make the query on props and options used for the full > > text search. > > I think we need to attack the store/indexing problem from the scenario > angle down... or we'll go around in circles for a long time. of course, > I'm not talking about Slide 2.0 but something to do after the release > is done. > I completely agree, a few scenarios/stories should be the first step and I hope the example I gave above fits in that category. Ollie > > Thanks for this mail, > > You are welcome. > > -- > Stefano. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
