Hi,
I wrote some components usefull for integrate UIMA-components inside a
Spring framework.
This components are Spring FactoryBeans that are able to produce
CasProcessors/Consumers , CollectionReaders and type systems.
The production can be made "totally programmatically", from descriptor
or a PEAR.
I want to release this components to the community, if it sounds good.
This works starts over code posted by Steven Bethard on this ml.
Thank a lot Steven!

I give some use's examples:

<!-- collection reader -->
        <bean name="cr" class="it.celi.uima.bean.CollectionReaderFactoryBean"
parent="baseAnnotator">
                <property name="componentClass"
value="it.celi.components.collection.RecursiveFileSytemCollectionReader"
/>
                <property name="configurationParameters">
                        <map>
                                <entry key="application" value="language" />
                                <entry key="language" value="it" />
                        </map>
                </property>
        </bean>

where baseAnnotator is:
        <bean name="baseAnnotator"
class="it.celi.uima.bean.AbstractUIMAComponentsFactoryBean"
abstract="true">
                <property name="typeSystem" ref="typeSystem" />
        </bean>

        <bean name="typeSystem" class="it.celi.uima.bean.TypeSytemFactoryBean">
                <property name="typeSytemPath"
value="file:../dd4-typeSystem/src/main/resources/CeliTypeSystem.xml"
/>
        </bean>
        

Processor/consumers:

        <bean name="sentenceAnnotator"
class="it.celi.uima.bean.CasProcessorFactoryBean"
parent="baseAnnotator">
                <property name="componentClass"
value="it.celi.annotators.language.SentenceAnnotator" />
                <property name="configurationParameters">
                        <map>
                                <entry key="abbreviationsFiles" 
value="abbreviations_*.txt" />
                                <entry key="additionalSeparatorsFiles" 
value="sentenceSeparators_*.txt" />
                        </map>
                </property>
        </bean>

        <bean name="xslSerializerCasConsumer"
class="it.celi.uima.bean.CasConsumerFactoryBean"
parent="baseAnnotator">
                <property name="componentClass"
value="it.celi.components.consumer.XslSerializerCasConsumer" />
                <property name="configurationParameters">
                        <map>
                                <entry key="fileExtension" value=".xml" />
                        </map>
                </property>
        </bean>


PEAR files (configuraiton parameters override is not allowed!):

        <bean name="japeAnnotator" 
class="it.celi.uima.bean.CasProcessorFactoryBean">
                <property name="descriptorPath" 
value="file:./pears/JapeAnnotator.pear" />
                <property name="redeployPear" value="true"/>

                <property name="configurationParameters">
                        <map>
                        </map>
                </property>
        </bean>

from descriptor with params override:

        <bean name="japeAnnotator" 
class="it.celi.uima.bean.CasProcessorFactoryBean">
                <property name="descriptorPath" 
value="file:./desc/RegExpTokenizer.xml" />
                <property name="configurationParameters">
                        <map>
                                <entry key="commandsFileName" 
value="commands_tokenizer_*.xml" />
                        </map>
                </property>
        </bean>


A simple use case coul be:

Configuration:

<bean name="cpm" class="org.apache.uima.UIMAFramework"
factory-method="newCollectionProcessingManager">

</bean>

        <bean name="uimaCPM" class="it.celi.uima.engine.CpmUIMAEngine">
                <property name="cpm" ref="cpm" />
                <property name="listeners">
                </property>
                <property name="readers">

                        <list>
                                <ref bean="rfcr" />
                        </list>
                </property>
                <property name="processors">
                        <list>
                                <ref bean="sentenceAnnotator" />
                                <ref bean="regExpTokenizer" />
                                <ref bean="japeAnnotator" />

                        </list>
                </property>
                <property name="consumers">
                        <list>
                                <ref bean="xslSerializerCasConsumer" />
                        </list>
                </property>
        </bean>


The last element is a CPMWrapper that inside do this:

Methods to add consumers and processors to cpm (lists are injected by
conf above):

        private void addAllConsumersToCpm() {
                for (CasConsumer casConsumer : consumers) {
                        String name = 
casConsumer.getProcessingResourceMetaData().getName();
                        try {
                                logger.info("adding consumer to pipeline::" + 
name);
                                cpm.addCasConsumer(casConsumer);

                        } catch (ResourceConfigurationException e) {

                                logger.error("unable to add processor  :: " + 
name, e);
                        }
                }

        }

        private void addAllProcessorToCpm() {
                for (CasProcessor casProcessor : processors) {
                        String name = 
casProcessor.getProcessingResourceMetaData().getName();

                        try {
                                logger.info("adding processor to pipeline::" + 
name);
                                cpm.addCasProcessor(casProcessor);
                        } catch (ResourceConfigurationException e) {
                                logger.error("unable to add processor  :: " + 
name, e);
                        }
                }

        }

and then in a method can do:

                        cpm.setCollectionReader(reader);
                        cpm.process();


Some advantage:
-only one simple file to configure a cpm
-easy to inject components
-easy to embed cpm/AE inside existing applications
-can use SpringIDE inside Eclipse
-....whatever?
Disadvantage:
-if you don't use Spring, there's another framework to learn
-you can't use the Eclipse's UIMA plugins to edit/manage descriptors
-Aggregate are not supported programmatically (via descriptors there's
no problem)
-....whatever?

Is it interesting? Let me now.

Roberto
-- 
Roberto Franchini
http://www.celi.it
http://www.blogmeter.it
http://www.memesphere.it
Tel +39-011-6600814
jabber:[EMAIL PROTECTED] skype:ro.franchini

Reply via email to