Re: UIMA Analysis Engines as Stanbol Enhancement Engines

florent andré Wed, 06 Jun 2012 17:17:03 -0700

Hi

I did an uima engine implementation a while ago (when Clerezza andStanbol don't still have release).

the idea behind this implementation is to create a service [0] thatembedded uima base dependencies and code for transforming UIMA resultsinto Stanbol enhancements. This "service" is an OSGI bundle.

This one don't do any uima processing it self.

If you want a real new UIMA pipeline you just have create a new OSGIbundle that equire :

- An implementation of the api define in the service [1]
- the uima chain definition aggregateAE [1.1]
- required uima dependencies in pom for your processing.

The final goal was to be able to have differents UIMA pipeline (each isa stanbol engine) that run together or in different processing chain.

Sadly, at the time of coding, UIMA was not totally osgified and Iencounter some class loading/sharing issues in OSGI when more than oneuima engine are in the osgi env.

Anyway this implementation works well with one uima stanbol engine. Maythe OSGI transformation in uima going and now multiple uima engine iseasy going.

The engine that work is uima.gasoil (dico + regex) [2], attempts to have2 uima engines are uima.dico and uima.regex.

This code work on "old" version of Stanbol (before actual engine andprocessing chain api), but update to the current api will not be sohard. Clerezza and UIMA dependencies are bit outdated also, but now wehave release for all, so that a good thing.

My chosen approach is more on "embeded uima in Stanbol" than "call uimaREST api", with this, you don't have to install and configure an uimainstance.

I had uploaded this code herehttps://github.com/florent-andre/stanbol-uima-engine

I don't have so much time to work on now, but will be happy to help andgive a hand to get it running on released version if you give a chanceto this code ! :)

++

[0]https://github.com/florent-andre/stanbol-uima-engine/tree/master/uimaservice

[1]https://github.com/florent-andre/stanbol-uima-engine/blob/master/uimaservice/src/main/java/org/apache/stanbol/enhancer/engines/uima/api/StanbolAnalysisEngine.java

[1.1]https://github.com/florent-andre/stanbol-uima-engine/tree/master/uima.gasoil/src/main/resources/configuration

[2]https://github.com/florent-andre/stanbol-uima-engine/tree/master/uima.gasoil


On 06/04/2012 06:45 PM, Mihály Héder wrote:

Hello Everyone,

I'm new to this list, my name is Mihály Héder ; I am the lead
developer of Sztakipedia project:
http://www.youtube.com/watch?v=8VW0TrvXpl4

Most of Sztakipedia's suggestions are based on UIMA Annoation Chains,
that are organized of UIMA Annotation Engines. This are similar stuff
to Enhancer Chains and Enhancement Engines, resp. If you are curious,
you can play around one of Sztakipedia's chains:
http://pedia.sztaki.hu:8080/tfidfengsb/?mode=form This is a
Tokenizer+Sentence boundary detector+lemmatizer+tf-idf calculator
chain (tf-idf is calculated on enwiki in this case)

If you are unfamiliar with it, the main feats of of UIMA are 1) you
can find a good number of annotation engines and chains already made,
packaged in pear files. 2) the type system stuff and chain building is
quite sophisticated and flexible 3) you can annotate not only texts
but binaries, images, etc. 4) we have very good experiences with its
performance 5) You can always say "This stuff is behind IBM Watson" ;)
. One could mention the Asynchron Scaleout functionality but we have
not so good experiences with that.

So right now I'm investigating how to integrate UIMA stuff into
Stanbol. After having read some Stanbol Docs and writing a Hello World
enhancement engine to get a grip on Stanbol, I think I this is how it
should be done:
-An adapter-like interface is needed that glues together two
components. If you use UIMA, most of the time you just have a pear
file from a third party that you cant/do not want to modify. It will
have its own type system, chain definition, etc. Also, hopefully there
will be much more Stanbol users than developers in the long run.
-This means that the real use case is that the future user downloads a
uima chain from somewhere, downloads stanbol, and want to glue the two
together without coding in either projects.
-However, most of the time it will be non-trivial to turn UIMA Feature
Sets to Stanbol Enhancements. In some cases I can imagine that you can
just turn every FS to a triple by a simple rule or something, but
making this flexible enough from some configuration files seems rather
unrealistic for me.

So what I have in mind now about UIMA->Enhancement conversion is:
-defining a simple java interface with one function, e.g:  Triple
convertFStoTriple(org.apache.uima.cas.FeatureStructure fs). By
implemeting this one function the user could easily define how feature
structs are to be turned to Triples. Most of the time this function
would give back nulls as there are usually much more UIMA
FeatureStructures generated (e.g about two for every word) than the
user want to deal with.
-creating an Enhancement Engine called UIMAAdapter. This would have a
converterClass Service Property that could be configured to contain
the name of the class the user just created. This would instantiate
the user-written class, provided that its on the classpath, and use it
to create enhancements.
-for more advanced cases we could provide an interface to map a
List<FeatureStructure>  to List<Triples>. For even more advanced cases
we could provide a convert(List<FeatureStructure>,ContentItem ci)
function with full access to the Stanbol ContentItem
-naturally we could write some default converter that converts every
FeatureStructure that comes out of UIMA to triples in a way for
testing purposes and for a basis of extension.

The other question is how to communicate with the UIMA Engine. I think
the feature of accessing a remotely deployed UIMA engine is a must and
the REST interface you can try out on the link above (provided by
UIMASimpleServlet) is good for starters. I'm much less sure that
embedding everything into a Stanbol Enhancement Engine that is needed
to run a UIMA engine is such a good idea, but I think it can be done.

What do you think of all the above?

p.s. Do you have a "How to write and deploy a Hello World Enhancement
Engine tutorial"? I have found the description of the functions to
implement, but still it took me a while to figure out how to deploy it
to felix, etc. If no, I can write one for you based on my notes.

Best,
Mihály

Re: UIMA Analysis Engines as Stanbol Enhancement Engines

Reply via email to