RE: Integrate Indexstore and SEARCH (was Indexing store)

Wallmer, Martin Tue, 27 Jan 2004 15:50:52 -0800

Hi,

> -----Original Message-----
> From: Stefano Mazzocchi [mailto:[EMAIL PROTECTED]
> Sent: Montag, 26. Januar 2004 19:26
> To: Slide Developers Mailing List
> Subject: Re: Integrate Indexstore and SEARCH (was Indexing store)
> 
> 
> 
> On 26 Jan 2004, at 04:18, Wallmer, Martin wrote:
> 
> > Hi Stefano,
> 
> >> One resource, could, in theory, be stored in different 
> stores at the
> >> same time
> >
> > how would you achieve that?
> 
> uh, what do you mean? [feeling stupid]


perhaps just a misunderstanding, I somehow read, one PUT stores into different stores 
:-)


> 
> >> or being indexed by different indexers at the same time.
> >>
> >> My point is: the way you partition your content tree for
> >> storing should
> >> not necessarely be the same that you might want to use for 
> indexing.
> >>
> >> And, more important, you might want to have indexers overlap
> >> (thus, not
> >> creating a partition of your content space but simply a coverage).
> >>
> >> Did I make sense?
> >>
> >
> > Could you please provide use cases for the different scenarios?
> 
> Sure.
> 
> I think that instead of writing an huber-index that is capable of 
> understanding all sort of information, I would like to build "focused 
> indexes" and then have the ability to query them differently for 
> performance and optimal reasons.
> 
> So, for example, consider a collection of xml documents, not all of 
> them have the same schema, but you know that, sometimes, having or 
> following a particular schema helps you index them in a 
> particular way 
> (consider, for example, having RDF content that needs to be 
> dereferenced against a remote terminology inferring service 
> in order to 
> unify the schema).
> 
> But, at the same time, this is xml content and you might want to use 
> the usual XPath queries on top of this.
> 
> So, this documents, potentially, need to be processed by different 
> indexers and I would like to be able to choose which one to 
> connect to 
> when making my query.
> 
> an example would be:
> 
>   1) *.xml -> XMLIndexer
>   2) /news/*.xml -> TextIndexer
>   3) /medical/guidelines/*.xml -> RDFIndexer
> 
> so, when you save a document as
> 
>   /medical/guidelines/chemiotherapy.xml
> 
> this is matched by both "*.xml" and "/medical/guidelines/*.xml" which 
> means that both the XMLIndexer and the RDFIndexer will do something 
> with it.
> 
> Then, later, I can ask questions like "give me all documents 
> that were 
> authored by Stefano" by having a "where" clause like 
> "//[contains(dc:author,'Stefano')]" and I would ask the XMLIndexer 
> about this, but for questions such as "give me all treatments that 
> don't involve the use of glicerine", I would ask the RDFIndexer.
> 
> But for questions like "give me all the news that contain the words 
> 'George' and 'Bush'" you would call the TextIndexer.
> 
> As you can see, there are potentially two approaches here:
> 
>   1) one indexer and differentiation is made by the user in the query
>   2) more (potentially overlapping) indexers and 
> differentiation is made 
> by the administrator (and users choose which index to use)
> 
> I tend to think that the second approach is better, also because it 
> contains the first.
> 
> -

Thanks, this makes things clearer. 

For our Tamino search engine we did a similar thing. As Tamino is perfect for doing 
xpath queries, we defined a proprietary operator for contentsearch, <xpath>. So the 
"give me all documents that were authored by Stefano" query would somehow look like

<DAV:where>
  <xsv:xpath>// [EMAIL PROTECTED]:author='Stefano']</xsv:xpath >        
</DAV:where>

If the namespace, where your document is stored (i.e. /medical/guidelines) provides an 
RDF indexer, you might have an operator

<DAV:where>
  <my:rdfquery>give me all treatments that don't involve the use of 
glicerine</my:rdfquery>
</DAV:where>

If you want to find "give me all the news that contain the words 'George' and 'Bush'", 
you would say

<DAV:where>
  <DAV:contains>'Bush' and 'George'</DAV:contains>      
</DAV:where>

This would be a way to call different indexers. The drawback is, that a lot of new 
proprietary operators are introduced to DASL, which is not good, if you use Slide as a 
WebDAV Server (the client needs to know about the extensions), which might be ok if 
you use Slide for your own content management system.


You somehow have to define, which indexer is valid for which scope, in your example

>   1) *.xml -> XMLIndexer
>   2) /news/*.xml -> TextIndexer
>   3) /medical/guidelines/*.xml -> RDFIndexer

(by the way, I'd prefer to use the contentType of a resource instead of the suffix to 
identify the indexer to use).

Currently the way to define scopes is a <scope> in domain.xml. Do you want to 
introduce a new mechanism for defining orthogonal (overlapping) structures for 
indexers, or could you use <scope>? Could you suggest a configuration for your 
scenario?

Best regards,
Martin




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Integrate Indexstore and SEARCH (was Indexing store)

Reply via email to