Hello,
I develop a rather large application with UIMA and RUTA in Java.
The goal is to detect and mark readability anomalies in a .docx document
with comments. For the rule part I use RUTA. While I already dealt with the
shortcomings of .docx APIs and parsing text, I'm about to fine-tune my
application.
My objective: Have a user add a ruta-script to the directory and the rule is
applied without further configuration. The relevant directory layout looks
like the following:
src/main/resources .
META-INF/org.apache.uima.fit/types.txt (path to type systems)
ruta-script/Main.ruta
ruta-script/Nouns.ruta
type-system/Anomaly.xml
type-system/BasicTypeSystem.xml
type-system/DKProCoreTypes.xml
type-system/InternalTypeSystem.xml
type-system/MainTypeSystem.xml (where I define my own rules)
type-system/ReadabilityScore.xml
So far, I see two ways of achieving my goal. However, both approaches do not
fit perfectly
1) As you see in the layout: Having a types.txt that specifies the path to
my type systems, the type system description does not need to be specified
in the application. I can create the JCas object the following way (without
a type system description):
JCas jCas = analysisEngine.newJCas();
However, I have to manually change the MainTypeSystem.xml, which contains my
self-declared types for this approach to work.
I like to add the new declared types of the ruta-script to the existing
MainTypeSystem.xml.
2) While looking at the RUTA documentation and the internals of the RUTA
Workbench for eclipse, I found that you can run ruta-scripts the following
way:
public static TypeSystemDescription getRutaRuleTypeSystem() throws
IOException, RecognitionException,
InvalidXMLException,
ResourceInitializationException, URISyntaxException {
RutaDescriptorFactory factory = new RutaDescriptorFactory();
RutaDescriptorInformation rd =
factory.parseDescriptorInformation("DECLARE NOUNS; (CW){->
MARK(PR_TEST)};");
TypeSystemDescription typeSystemDescription =
factory.createTypeSystemDescription("test.xml", rd,
new RutaBuildOptions(), null);
return typeSystemDescription;
}}
However, this approach seems tedious to me.