Hi,
We have developped in IBM research lab in Haifa a tool corresponding to
your description! :)
Our package -we have named Semantic Search upon Lucene- lets the user
indexing the annotations of the text, and querying some "Semantic Queries"
including constraints over the Annotations and the free-text in the span of
these Annotations.
Our implementation uses the Lucene's payloads and we have written an
efficient algorithm for efficiently retrieving "Twig" queries of Structure
(Annotations) and content (free-text or boolean constraints).
The kind of queries you can ask are "free terms in the span of an
annotation, that is itself in the span of another annotation etc.."
including boolean constraints and attribute constraints.
Two query languages are currently supported : XML Fragments and a subset of
XPath. The XML Fragments possesses operators especially designed for
querying the intersection of annotations that can occur with UIMA.
Currently, our package is not available as Open Source.
I will answer with pleasure to any questions relative to this project!
Best regards,
Benjamin Sznajder
IBM Haifa Research Laboratory
Information Retrieval Group.
Thilo Goetz
<[EMAIL PROTECTED]>
To
08/06/2007 14:59 [email protected]
cc
Please respond to Subject
[EMAIL PROTECTED] Re: Lucene and UIMA was Re: Human
or.apache.org annotation tool for UIMA
Grant Ingersoll wrote:
[...]
> I'm pretty new to UIMA, but know a thing or two about Lucene. Care to
> share more about what kinds of things you are interested in? Are you
> talking Semantic Web type stuff or things like enhanced NLP search? I
> see from the Javadocs that there is a place to hook in search
> implementations, but haven't dug deeper than that.
>
>
> --------------------------
> Grant Ingersoll
> Center for Natural Language Processing
> http://www.cnlp.org/tech/lucene.asp
>
> Read the Lucene Java FAQ at
http://wiki.apache.org/jakarta-lucene/LuceneFAQ
>
The IBM version of UIMA comes with a search engine that lets you search for
UIMA annotations. Suppose you have book description annotations, and
inside
those, author, title etc. annotations. With this search engine, you are
able
to, for example, search for title and author words that occur within the
same
book annotation (and there could be many in a given document). It's like
searching XML documents with XPath and ftcontains. You didn't use to be
able
to do this sort of thing with Lucene, but with the new payloads, you could
implement something like it. That's what I would call a non-trivial
integration
of UIMA into Lucene.
--Thilo