RE: Indexing of deep structured XML
Goulish, Michael writes: > > To really preserve the relationships in arbitrarily > structured XML, you pretty much need to use a database > that directly supports an XML query language like > XQuery or XPath. > If searching within regions is enough (something e.g. sgrep (http://www.cs.helsinki.fi/u/jjaakkol/sgrep.html) or OpenText/PAT does), I think this can be done on top of lucene. Basically you need to index region start and region end markers. In order to search a term within a region, you can use TermPositions to loop over all matches of the term and all start and end markers of the region to check where you find a match within this region. Of course search logic for region search is quite different to lucenes document queries. There are two types of results (match points and regions) and the basic operations include match points/region in region, region containing match points/region, joins and intersection of match points or regions. I don't know if and how this could be integrated with lucenes normal queries. But of course one could get a list of matching documents from results of region searches. If you (ab)use lucenes token position to store the character position of the token, you could also extract the regions text from a stored copy. I'm currently doing some experiments with such kind of queries using lucene and find it performs quite well. You won't be able to distinguish between parents and other ancestors though and there won't be any support for searching siblings. Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Indexing of deep structured XML
[EMAIL PROTECTED] wrote: ... by mapping all the xml tags (name, street, postcode and city) it to the documents (address) fields directly. However is it also possible to map these? Here we have a hierarchy in area (niceplace) which I want to preserve. Suppose that the meaning of niceplace in an area is different from the niceplace in the first xml structure (closer specified). I want to preserve this. Is there a way to index with Lucene means? If not, are there any attempt of people doing this or does somebody have ideas how this could be solved? I usually preprocess hierarchical xml documents via xslt to generate flat ones with coresponding element - field names before indexing. or Markus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Indexing of deep structured XML
To really preserve the relationships in arbitrarily structured XML, you pretty much need to use a database that directly supports an XML query language like XQuery or XPath. Mick . -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 16, 2004 8:19 AM To: [EMAIL PROTECTED] Subject: Indexing of deep structured XML Hello all, it is obviously possible to index the follwoing XML structure in Lucene: by mapping all the xml tags (name, street, postcode and city) it to the documents (address) fields directly. However is it also possible to map these? Here we have a hierarchy in area (niceplace) which I want to preserve. Suppose that the meaning of niceplace in an area is different from the niceplace in the first xml structure (closer specified). I want to preserve this. Is there a way to index with Lucene means? If not, are there any attempt of people doing this or does somebody have ideas how this could be solved? Cheers, Karl -- +++ GMX - die erste Adresse für Mail, Message, More +++ Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Indexing of deep structured XML
Hi Karl, ol' fellow try the apache commons digester. there is a nice explanation about how it works written by thomas habing. regards thomas [EMAIL PROTECTED] wrote: Hello all, it is obviously possible to index the follwoing XML structure in Lucene: by mapping all the xml tags (name, street, postcode and city) it to the documents (address) fields directly. However is it also possible to map these? Here we have a hierarchy in area (niceplace) which I want to preserve. Suppose that the meaning of niceplace in an area is different from the niceplace in the first xml structure (closer specified). I want to preserve this. Is there a way to index with Lucene means? If not, are there any attempt of people doing this or does somebody have ideas how this could be solved? Cheers, Karl - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Indexing of deep structured XML
Hello all, it is obviously possible to index the follwoing XML structure in Lucene: by mapping all the xml tags (name, street, postcode and city) it to the documents (address) fields directly. However is it also possible to map these? Here we have a hierarchy in area (niceplace) which I want to preserve. Suppose that the meaning of niceplace in an area is different from the niceplace in the first xml structure (closer specified). I want to preserve this. Is there a way to index with Lucene means? If not, are there any attempt of people doing this or does somebody have ideas how this could be solved? Cheers, Karl -- +++ GMX - die erste Adresse für Mail, Message, More +++ Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]