Hi! Its a nice project, thanks for sharing!
On 7 June 2012 01:43, florent andré <[email protected]> wrote: > Hi > > I did an uima engine implementation a while ago (when Clerezza and Stanbol > don't still have release). > > the idea behind this implementation is to create a service [0] that embedded > uima base dependencies and code for transforming UIMA results into Stanbol > enhancements. This "service" is an OSGI bundle. > This one don't do any uima processing it self. > > If you want a real new UIMA pipeline you just have create a new OSGI bundle > that equire : > - An implementation of the api define in the service [1] > - the uima chain definition aggregateAE [1.1] > - required uima dependencies in pom for your processing. > The final goal was to be able to have differents UIMA pipeline (each is a > stanbol engine) that run together or in different processing chain. > > Sadly, at the time of coding, UIMA was not totally osgified and I encounter > some class loading/sharing issues in OSGI when more than one uima engine are > in the osgi env. This is exactly the kind of issue I am afraid of when we talk about embedding UIMA in Stanbol - UIMA is a huge beast on its own. > Anyway this implementation works well with one uima stanbol engine. May the > OSGI transformation in uima going and now multiple uima engine is easy > going. > > The engine that work is uima.gasoil (dico + regex) [2], attempts to have 2 > uima engines are uima.dico and uima.regex. I'm looking into these sources. Am I correct that the problems came when you tried to instantiate UIMAExecutors? > This code work on "old" version of Stanbol (before actual engine and > processing chain api), but update to the current api will not be so hard. > Clerezza and UIMA dependencies are bit outdated also, but now we have > release for all, so that a good thing. > > My chosen approach is more on "embeded uima in Stanbol" than "call uima REST > api", with this, you don't have to install and configure an uima instance. Embedding undeniably has its advantages, but we might be able to emulate many of those if we provide a small piece of software that embeds the UIMA AE with the necessary configuration in its own JVM and an other part of this sw would be an Enhancer. The two parts could communicate trough REST. The part on the Stanbol side could might even manage and monitor the status of the other part (start-stop-restart as necessary) thus easing the extra work that will be caused by running two (or more when there's more than one UIMA instances) services. Also, REST is certainly needed when we need to run UIMA instances on separate machines as they consume lots of resources - that is something we already had to do to in Sztakipedia to keep response times relatively low, and I'm sure distribution will be needed in Stanbol as well. Now, at this point it occurred to me that OSGi is distributable as far as I know. Has anyone experience with this? Can I distribute Engines in a Stanbol+Felix deployment? > I had uploaded this code here > https://github.com/florent-andre/stanbol-uima-engine > > I don't have so much time to work on now, but will be happy to help and give > a hand to get it running on released version if you give a chance to this > code ! :) I really appreciate that, thanks! > ++ > > [0] > https://github.com/florent-andre/stanbol-uima-engine/tree/master/uimaservice > > [1] > https://github.com/florent-andre/stanbol-uima-engine/blob/master/uimaservice/src/main/java/org/apache/stanbol/enhancer/engines/uima/api/StanbolAnalysisEngine.java > > [1.1] > https://github.com/florent-andre/stanbol-uima-engine/tree/master/uima.gasoil/src/main/resources/configuration > > [2] > https://github.com/florent-andre/stanbol-uima-engine/tree/master/uima.gasoil > > > On 06/04/2012 06:45 PM, Mihály Héder wrote: >> >> Hello Everyone, >> >> I'm new to this list, my name is Mihály Héder ; I am the lead >> developer of Sztakipedia project: >> http://www.youtube.com/watch?v=8VW0TrvXpl4 >> >> Most of Sztakipedia's suggestions are based on UIMA Annoation Chains, >> that are organized of UIMA Annotation Engines. This are similar stuff >> to Enhancer Chains and Enhancement Engines, resp. If you are curious, >> you can play around one of Sztakipedia's chains: >> http://pedia.sztaki.hu:8080/tfidfengsb/?mode=form This is a >> Tokenizer+Sentence boundary detector+lemmatizer+tf-idf calculator >> chain (tf-idf is calculated on enwiki in this case) >> >> If you are unfamiliar with it, the main feats of of UIMA are 1) you >> can find a good number of annotation engines and chains already made, >> packaged in pear files. 2) the type system stuff and chain building is >> quite sophisticated and flexible 3) you can annotate not only texts >> but binaries, images, etc. 4) we have very good experiences with its >> performance 5) You can always say "This stuff is behind IBM Watson" ;) >> . One could mention the Asynchron Scaleout functionality but we have >> not so good experiences with that. >> >> So right now I'm investigating how to integrate UIMA stuff into >> Stanbol. After having read some Stanbol Docs and writing a Hello World >> enhancement engine to get a grip on Stanbol, I think I this is how it >> should be done: >> -An adapter-like interface is needed that glues together two >> components. If you use UIMA, most of the time you just have a pear >> file from a third party that you cant/do not want to modify. It will >> have its own type system, chain definition, etc. Also, hopefully there >> will be much more Stanbol users than developers in the long run. >> -This means that the real use case is that the future user downloads a >> uima chain from somewhere, downloads stanbol, and want to glue the two >> together without coding in either projects. >> -However, most of the time it will be non-trivial to turn UIMA Feature >> Sets to Stanbol Enhancements. In some cases I can imagine that you can >> just turn every FS to a triple by a simple rule or something, but >> making this flexible enough from some configuration files seems rather >> unrealistic for me. >> >> So what I have in mind now about UIMA->Enhancement conversion is: >> -defining a simple java interface with one function, e.g: Triple >> convertFStoTriple(org.apache.uima.cas.FeatureStructure fs). By >> implemeting this one function the user could easily define how feature >> structs are to be turned to Triples. Most of the time this function >> would give back nulls as there are usually much more UIMA >> FeatureStructures generated (e.g about two for every word) than the >> user want to deal with. >> -creating an Enhancement Engine called UIMAAdapter. This would have a >> converterClass Service Property that could be configured to contain >> the name of the class the user just created. This would instantiate >> the user-written class, provided that its on the classpath, and use it >> to create enhancements. >> -for more advanced cases we could provide an interface to map a >> List<FeatureStructure> to List<Triples>. For even more advanced cases >> we could provide a convert(List<FeatureStructure>,ContentItem ci) >> function with full access to the Stanbol ContentItem >> -naturally we could write some default converter that converts every >> FeatureStructure that comes out of UIMA to triples in a way for >> testing purposes and for a basis of extension. >> >> The other question is how to communicate with the UIMA Engine. I think >> the feature of accessing a remotely deployed UIMA engine is a must and >> the REST interface you can try out on the link above (provided by >> UIMASimpleServlet) is good for starters. I'm much less sure that >> embedding everything into a Stanbol Enhancement Engine that is needed >> to run a UIMA engine is such a good idea, but I think it can be done. >> >> What do you think of all the above? >> >> p.s. Do you have a "How to write and deploy a Hello World Enhancement >> Engine tutorial"? I have found the description of the functions to >> implement, but still it took me a while to figure out how to deploy it >> to felix, etc. If no, I can write one for you based on my notes. >> >> Best, >> Mihály
