RE: Indexing of deep structured XML

2004-01-18 Thread Morus Walter
Goulish, Michael writes:
> 
> To really preserve the relationships in arbitrarily 
> structured XML, you pretty much need to use a database 
> that directly supports an XML query language like 
> XQuery or XPath.
> 
If searching within regions is enough (something e.g. sgrep 
(http://www.cs.helsinki.fi/u/jjaakkol/sgrep.html) or OpenText/PAT does),
I think this can be done on top of lucene.

Basically you need to index region start and region end markers.
In order to search a term within a region, you can use TermPositions
to loop over all matches of the term and all start and end markers of
the region to check where you find a match within this region.

Of course search logic for region search is quite different to lucenes
document queries.
There are two types of results (match points and regions) and the
basic operations include match points/region in region, region containing
match points/region, joins and intersection of match points or regions.
I don't know if and how this could be integrated with lucenes normal
queries. But of course one could get a list of matching documents from
results of region searches.
If you (ab)use lucenes token position to store the character position
of the token, you could also extract the regions text from a stored copy.

I'm currently doing some experiments with such kind of queries using lucene
and find it performs quite well.

You won't be able to distinguish between parents and other ancestors 
though and there won't be any support for searching siblings.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Indexing of deep structured XML

2004-01-16 Thread Markus Spath
[EMAIL PROTECTED] wrote:
...
by mapping all the xml tags (name, street, postcode and city) it to the
documents (address) fields directly. However is it also possible to map these?

  
  
  

  

Here we have a hierarchy in area (niceplace) which I want to preserve.
Suppose that the meaning of niceplace in an area is different from the niceplace
in the first xml structure (closer specified). I want to preserve this. 

Is there a way to index with Lucene means? If not, are there any attempt of
people doing this or does somebody have ideas how this could be solved?
I usually preprocess hierarchical xml documents via xslt to generate flat ones 
with coresponding element - field names before indexing.


  
  
   or 

Markus



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Indexing of deep structured XML

2004-01-16 Thread Goulish, Michael

To really preserve the relationships in arbitrarily 
structured XML, you pretty much need to use a database 
that directly supports an XML query language like 
XQuery or XPath.

 Mick .



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 16, 2004 8:19 AM
To: [EMAIL PROTECTED]
Subject: Indexing of deep structured XML


Hello all,

it is obviously possible to index the follwoing XML structure in Lucene:


  
  
  
  


by mapping all the xml tags (name, street, postcode and city) it to the
documents (address) fields directly. However is it also possible to map these?


  
  
  

  


Here we have a hierarchy in area (niceplace) which I want to preserve.
Suppose that the meaning of niceplace in an area is different from the niceplace
in the first xml structure (closer specified). I want to preserve this. 

Is there a way to index with Lucene means? If not, are there any attempt of
people doing this or does somebody have ideas how this could be solved?

Cheers,
Karl

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Indexing of deep structured XML

2004-01-16 Thread Thomas Krämer
Hi Karl, ol' fellow

try the apache commons digester.
there is a nice explanation about how it works written by thomas habing.
regards

thomas

[EMAIL PROTECTED] wrote:
Hello all,

it is obviously possible to index the follwoing XML structure in Lucene:


  
  
  
  

by mapping all the xml tags (name, street, postcode and city) it to the
documents (address) fields directly. However is it also possible to map these?

  
  
  

  

Here we have a hierarchy in area (niceplace) which I want to preserve.
Suppose that the meaning of niceplace in an area is different from the niceplace
in the first xml structure (closer specified). I want to preserve this. 

Is there a way to index with Lucene means? If not, are there any attempt of
people doing this or does somebody have ideas how this could be solved?
Cheers,
Karl


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Indexing of deep structured XML

2004-01-16 Thread TheRanger
Hello all,

it is obviously possible to index the follwoing XML structure in Lucene:


  
  
  
  


by mapping all the xml tags (name, street, postcode and city) it to the
documents (address) fields directly. However is it also possible to map these?


  
  
  

  


Here we have a hierarchy in area (niceplace) which I want to preserve.
Suppose that the meaning of niceplace in an area is different from the niceplace
in the first xml structure (closer specified). I want to preserve this. 

Is there a way to index with Lucene means? If not, are there any attempt of
people doing this or does somebody have ideas how this could be solved?

Cheers,
Karl

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]