Re: Problem writing ruta extensions

2013-12-05 Thread Peter Klügl
Hi,

Am 04.12.2013 18:33, schrieb Sebastian:
 Hi,

 I'm highly interested in ruta, and its potential applications in industrial
 applications. Right know I'm trying to create a simple toy condition
 extension that is simply a case insensitive INLIST condition. It is
 completely based on the InListCondition class, I also declared an
 implementation of the IRutaConditionExtension interface.

 With primitve types everything seems to work great, except when the
 condition is used with a variable :

 STRINGLIST MonthsList = {january, ...};
 DECLARE Month;
 ANY{INSENSITIVEINLIST(MonthsList) - MARK(Month)};

 I get a class cast exception when the condition is being created, because
 MonthsList is a SimpleTypeExpression and I'm expecting a 
 StringListExpression. 

 Am I doing something wrong ? I suppose there is a way to resolve the
 variable to the actual list, but I missed it somehow.


It's hard to say what went wrong. My first guess would be that there is
a problem in your extension. I just verified that INLIST works at all (I
haven't used it myself for a long time).

The example works with INLIST:

STRINGLIST MonthsList = {january};
DECLARE Month;
ANY{INLIST(MonthsList) - MARK(Month)};

Can you post the stacktrace of the exception? Or can you send me the
source code of your extension (in case you do not want to post it on a
public mailing list)?

Anyways, the usage of INLIST makes only sense if you want to work on
dynamic dictionaries that may change during rule execution. Have you
taken a look at the MARKFAST or TRIE action?
http://uima.apache.org/d/ruta-current/tools.ruta.book.html#ugr.tools.ruta.language.actions.markfast
They already have options for case-insensitivity and are overall faster
and more powerful.

Best,

Peter

PS: You can, of course, also post a feature request on JIRA for adding a
case-insensitivity to the INLIST condition :-)


 Any ideas on how that could be done?

 Regards

 Sebastian




Re: Problem writing ruta extensions

2013-12-05 Thread Peter Klügl
Sorry, I read over the mention that it is only a toy extension... so
maybe ignore the advice with the actions ;-)

Can you check whether there is a Type  in the typesystem with the short
name MonthsList?

Best,

Peter


Am 05.12.2013 09:58, schrieb Peter Klügl:
 Hi,

 Am 04.12.2013 18:33, schrieb Sebastian:
 Hi,

 I'm highly interested in ruta, and its potential applications in industrial
 applications. Right know I'm trying to create a simple toy condition
 extension that is simply a case insensitive INLIST condition. It is
 completely based on the InListCondition class, I also declared an
 implementation of the IRutaConditionExtension interface.

 With primitve types everything seems to work great, except when the
 condition is used with a variable :

 STRINGLIST MonthsList = {january, ...};
 DECLARE Month;
 ANY{INSENSITIVEINLIST(MonthsList) - MARK(Month)};

 I get a class cast exception when the condition is being created, because
 MonthsList is a SimpleTypeExpression and I'm expecting a 
 StringListExpression. 

 Am I doing something wrong ? I suppose there is a way to resolve the
 variable to the actual list, but I missed it somehow.

 It's hard to say what went wrong. My first guess would be that there is
 a problem in your extension. I just verified that INLIST works at all (I
 haven't used it myself for a long time).

 The example works with INLIST:

 STRINGLIST MonthsList = {january};
 DECLARE Month;
 ANY{INLIST(MonthsList) - MARK(Month)};

 Can you post the stacktrace of the exception? Or can you send me the
 source code of your extension (in case you do not want to post it on a
 public mailing list)?

 Anyways, the usage of INLIST makes only sense if you want to work on
 dynamic dictionaries that may change during rule execution. Have you
 taken a look at the MARKFAST or TRIE action?
 http://uima.apache.org/d/ruta-current/tools.ruta.book.html#ugr.tools.ruta.language.actions.markfast
 They already have options for case-insensitivity and are overall faster
 and more powerful.

 Best,

 Peter

 PS: You can, of course, also post a feature request on JIRA for adding a
 case-insensitivity to the INLIST condition :-)


 Any ideas on how that could be done?

 Regards

 Sebastian




Re: big offsets efficiency, and multiple offsets

2013-12-05 Thread Jens Grivolla
I agree that it might make more sense to model our needs more directly 
instead of trying to squeeze it into the schema we normally use for text 
processing.  But at the same time I would of course like to avoid having 
to reimplement many of the things that are already available when using 
AnnotationBase.


For the cross-view indexing issue I was thinking of creating individual 
views for each modality and then a merged view that just contains a 
subset of annotations of each view, and on which we would do the 
cross-modal reasoning.


I just looked again at the GaleMultiModalExample (not much there, 
unfortunately) and saw that e.g. AudioSpan derives from AnnotationBase 
but still has float values for begin/end.  I would be really interested 
in learning more about what was done in GALE, but it's hard to find any 
relevant information...


Thanks,
Jens

On 04/12/13 20:16, Marshall Schor wrote:

Echoing Richard,

1) It would perhaps make more sense to be more direct about each of the
different types of data.  UIMA built-in only the most popular things - and
Annotation was one of them.

Annotation derives from Annotation-base, which just defines an associated Sofa /
view.

So it would make more sense to define different kinds of highest-level
abstractions for your project, related to the different kinds of views/sofas.
Audio might entail a begin / end style of offsets;  Images might entail a pair
x-y coordinates, to describe a (square) subset of an image.  Video might do
something like audio, or something more complex...

UIMA's use of the AnnotationBase includes insuring that when you add-to-indexes
(an operation that implicitly takes a view - and adds a FS to that view), that
if the FS is a subtype of AnnotationBase, then the FS must be indexed in the
associated view to which that FS belongs; if you try to add-to-index in a view
other than the one the FS was created in, you get this kind of error:

Error - the Annotation {0} is over view {1} and cannot be added to indexes
associated with the different view {2}.

The logic behind this restriction is:  an Annotation (or, more generally, an
object having a supertype of AnnotationBase) is (by definition) associated with
a particular Sofa/View,  and it is more likely that it is an error if that
annotation is indexed with a sofa it doesn't belong with.

Of course, Feature Structures which are not Annotations (or more generally, not
derived from AnnotationBase), can be indexed in multiple views.

2) By keeping separate notions for pointers-into-the-Sofa, you can define
algorithmic mappings for these that make the best sense for your project,
including notions of fuzzyness, time-shift (imagine the audio is out-of-sync
with the video, like lots of u-tube things seem to be), etc.

-Marshall


On 12/4/2013 9:31 AM, Jens Grivolla wrote:

Hi, we're now starting the EUMSSI project, which deals with integrating
annotation layers coming from audio, video and text analysis.

We're thinking to base it all on UIMA, having different views with separate
audio, video, transcribed text, etc. sofas.  In order to align the different
views we need to have a common offset specification that allows us to map e.g.
character offsets to the corresponding timestamps.

In order to avoid float timestamps (which would mean we can't derive from
Annotation) I was thinking of using audio/video frames with e.g. 100 or 1000
frames/second.  Annotation has begin and end defined as signed 32 bit ints,
leaving sufficient room for very long documents even at 1000 fps, so I don't
think we're going to run into any limits there.  Is there anything that could
become problematic when working with offsets that are probably quite a bit
larger than what is typically found with character offsets?

Also, can I have several indexes on the same annotations in order to work with
character offsets for text analysis, but then efficiently query for
overlapping annotations from other views based on frame offsets?

Btw, if you're interested in the project we have a writeup (condensed from the
project proposal) here:
https://dl.dropboxusercontent.com/u/4169273/UIMA_EUMSSI.pdf and there will
hopefully soon be some content on http://eumssi.eu/

Thanks,
Jens










Re: big offsets efficiency, and multiple offsets

2013-12-05 Thread Jens Grivolla
I forgot to say that the text analysis view(s) will necessarily have to 
use character offsets so that we can obtain the coveredText, which means 
that all resulting annotations will also use character offsets.  The 
merged view will need to use time-based offsets which means that we have 
to recreate the annotations there with mapped offsets rather than just 
index the same annotations in a different view.


I think that basically means that we won't do much cross-view querying 
but rather have one component (AE) that reads from all views and creates 
a new one with new independent annotations after mapping the offsets.


-- Jens

On 05/12/13 10:04, Jens Grivolla wrote:

I agree that it might make more sense to model our needs more directly
instead of trying to squeeze it into the schema we normally use for text
processing.  But at the same time I would of course like to avoid having
to reimplement many of the things that are already available when using
AnnotationBase.

For the cross-view indexing issue I was thinking of creating individual
views for each modality and then a merged view that just contains a
subset of annotations of each view, and on which we would do the
cross-modal reasoning.

I just looked again at the GaleMultiModalExample (not much there,
unfortunately) and saw that e.g. AudioSpan derives from AnnotationBase
but still has float values for begin/end.  I would be really interested
in learning more about what was done in GALE, but it's hard to find any
relevant information...

Thanks,
Jens

On 04/12/13 20:16, Marshall Schor wrote:

Echoing Richard,

1) It would perhaps make more sense to be more direct about each of the
different types of data.  UIMA built-in only the most popular
things - and
Annotation was one of them.

Annotation derives from Annotation-base, which just defines an
associated Sofa /
view.

So it would make more sense to define different kinds of highest-level
abstractions for your project, related to the different kinds of
views/sofas.
Audio might entail a begin / end style of offsets;  Images might
entail a pair
x-y coordinates, to describe a (square) subset of an image.  Video
might do
something like audio, or something more complex...

UIMA's use of the AnnotationBase includes insuring that when you
add-to-indexes
(an operation that implicitly takes a view - and adds a FS to that
view), that
if the FS is a subtype of AnnotationBase, then the FS must be indexed
in the
associated view to which that FS belongs; if you try to add-to-index
in a view
other than the one the FS was created in, you get this kind of error:

Error - the Annotation {0} is over view {1} and cannot be added to
indexes
associated with the different view {2}.

The logic behind this restriction is:  an Annotation (or, more
generally, an
object having a supertype of AnnotationBase) is (by definition)
associated with
a particular Sofa/View,  and it is more likely that it is an error if
that
annotation is indexed with a sofa it doesn't belong with.

Of course, Feature Structures which are not Annotations (or more
generally, not
derived from AnnotationBase), can be indexed in multiple views.

2) By keeping separate notions for pointers-into-the-Sofa, you can define
algorithmic mappings for these that make the best sense for your project,
including notions of fuzzyness, time-shift (imagine the audio is
out-of-sync
with the video, like lots of u-tube things seem to be), etc.

-Marshall


On 12/4/2013 9:31 AM, Jens Grivolla wrote:

Hi, we're now starting the EUMSSI project, which deals with integrating
annotation layers coming from audio, video and text analysis.

We're thinking to base it all on UIMA, having different views with
separate
audio, video, transcribed text, etc. sofas.  In order to align the
different
views we need to have a common offset specification that allows us to
map e.g.
character offsets to the corresponding timestamps.

In order to avoid float timestamps (which would mean we can't derive
from
Annotation) I was thinking of using audio/video frames with e.g. 100
or 1000
frames/second.  Annotation has begin and end defined as signed 32 bit
ints,
leaving sufficient room for very long documents even at 1000 fps, so
I don't
think we're going to run into any limits there.  Is there anything
that could
become problematic when working with offsets that are probably quite
a bit
larger than what is typically found with character offsets?

Also, can I have several indexes on the same annotations in order to
work with
character offsets for text analysis, but then efficiently query for
overlapping annotations from other views based on frame offsets?

Btw, if you're interested in the project we have a writeup (condensed
from the
project proposal) here:
https://dl.dropboxusercontent.com/u/4169273/UIMA_EUMSSI.pdf and there
will
hopefully soon be some content on http://eumssi.eu/

Thanks,
Jens














Re: Running CasMultiplier inside a JCasIterable

2013-12-05 Thread Richard Eckart de Castilho
No, the issue is still open. 

When I start working on one of the issues that are still recorded on Google 
Code, I open a corresponding issue on the Apache Jira and add a link to each of 
them, pointing to each other. I also set the ASFJira flag on the Google Code 
tracker to true.

-- Richard

On 05.12.2013, at 02:07, Swirl lriwsw...@gmail.com wrote:

 
 Option 2 - let UIMA do the heavy lifting
 
 An alternative and much simple approach might be to create an aggregate which
 does not only contain the engines, but also the reader. Then you don't have 
 to 
 worry about the reader anymore at all. Just create a UIMA JCasIterator and 
 poll CASes from that until it is empty. Some additional info may be found in
 the legacy issue 89 [1].
 
 
 Hi Richard,
 Is the code in issue 89, implemented in uimafit 2.0.0?
 It does not work in uimafit 1.4.0 that I currently have.


how to dynamically set a required annotation type from within a UIMAfit annotator?

2013-12-05 Thread Renaud Richardet
I find it very convenient to add

@TypeCapability(inputs = { TOKEN, SENTENCE, COOCCURRENCE })
so that I can ensure that dependencies are met. But sometimes, the
dependencies are dynamic (e.g. an input type capability is part of the
config of an annotator, and is loaded dynamically, see code below).

Is there a way to dynamically set a required annotation type from within a
UIMAfit annotator? Something like:

@Override

public void initialize(UimaContext context)

throws ResourceInitializationException {

super.initialize(context);

try {

// loading annotation class dynamically

requiredAnnotation= (Class? extends Annotation) Class.forName(
org.uima.MyRequiredAnnotation);

// adding it as TypeCapability's input

context.getMetadata().addCapabilityInput(requiredAnnotation);

} catch (Exception e) {

throw new ResourceInitializationException(e);

}

}


Thanks, Renaud


Re: how to dynamically set a required annotation type from within a UIMAfit annotator?

2013-12-05 Thread Richard Eckart de Castilho
To my knowledge, the capabilities are part of the descriptor which must be 
available before the AE is initialized. You cannot retroactively change the
descriptor of a method from within its initialize() method.

It would be nice to have something like this, though. But that would also mean
switching any flow controllers which use this information from a static planning
to a dynamic planning approach.

How about filing a feature request against the UIMA framework?

-- Richard

On 05.12.2013, at 08:35, Renaud Richardet renaud.richar...@gmail.com wrote:

 I find it very convenient to add
 
 @TypeCapability(inputs = { TOKEN, SENTENCE, COOCCURRENCE })
 so that I can ensure that dependencies are met. But sometimes, the
 dependencies are dynamic (e.g. an input type capability is part of the
 config of an annotator, and is loaded dynamically, see code below).
 
 Is there a way to dynamically set a required annotation type from within a
 UIMAfit annotator? Something like:
 
@Override
 
public void initialize(UimaContext context)
 
throws ResourceInitializationException {
 
super.initialize(context);
 
try {
 
// loading annotation class dynamically
 
requiredAnnotation= (Class? extends Annotation) Class.forName(
 org.uima.MyRequiredAnnotation);
 
// adding it as TypeCapability's input
 
context.getMetadata().addCapabilityInput(requiredAnnotation);
 
} catch (Exception e) {
 
throw new ResourceInitializationException(e);
 
}
 
}
 
 
 Thanks, Renaud



Re: Problem writing ruta extensions

2013-12-05 Thread Sebastian
Peter Klügl pkluegl@... writes:

 
 Hi,
 
 Am 04.12.2013 18:33, schrieb Sebastian:
  Hi,
 
  I'm highly interested in ruta, and its potential applications in 
industrial
  applications. Right know I'm trying to create a simple toy condition
  extension that is simply a case insensitive INLIST condition. It is
  completely based on the InListCondition class, I also declared an
  implementation of the IRutaConditionExtension interface.
 
  With primitve types everything seems to work great, except when the
  condition is used with a variable :
 
  STRINGLIST MonthsList = {january, ...};
  DECLARE Month;
  ANY{INSENSITIVEINLIST(MonthsList) - MARK(Month)};
 
  I get a class cast exception when the condition is being created, 
because
  MonthsList is a SimpleTypeExpression and I'm expecting a 
StringListExpression. 
 
  Am I doing something wrong ? I suppose there is a way to resolve the
  variable to the actual list, but I missed it somehow.
 
 
 It's hard to say what went wrong. My first guess would be that there is
 a problem in your extension. I just verified that INLIST works at all (I
 haven't used it myself for a long time).
 
 The example works with INLIST:
 
 STRINGLIST MonthsList = {january};
 DECLARE Month;
 ANY{INLIST(MonthsList) - MARK(Month)};
 
 Can you post the stacktrace of the exception? Or can you send me the
 source code of your extension (in case you do not want to post it on a
 public mailing list)?
 
 Anyways, the usage of INLIST makes only sense if you want to work on
 dynamic dictionaries that may change during rule execution. Have you
 taken a look at the MARKFAST or TRIE action?
 http://uima.apache.org/d/ruta-
current/tools.ruta.book.html#ugr.tools.ruta.language.actions.markfast
 They already have options for case-insensitivity and are overall faster
 and more powerful.
 
 Best,
 
 Peter
 
 PS: You can, of course, also post a feature request on JIRA for adding a
 case-insensitivity to the INLIST condition 
 
  Any ideas on how that could be done?
 
  Regards
 
  Sebastian
 
 
 


Hi Peter,
Before giving the code, let me explain why I'm interested in a case 
insensitive inlist.

As far as I understand the behaviour of MARKFAST, it cannot be used with 
more complex conditions than list containment. The problem with TRIE is that 
it requires an external resource that is somewhat read from the file system, 
whereas I'm interested in somehow embedding resources in jars and reading 
them using classloader getResource capabilities (maybe I missed something 
there too).
But you are right, I don't need a dynamic dictionary :)

Anyway, here's how I declared it :

public class CIInListCondition extends TerminalRutaCondition {

private StringListExpression stringList;

public CIInListCondition(StringListExpression list) {
super();
this.stringList = list;
}

@Override
public EvaluatedCondition eval(AnnotationFS annotation,
RuleElement element, RutaStream stream, InferenceCrowd crowd) {
String coveredText = annotation.getCoveredText();
if (StringUtils.isEmpty(coveredText))
return new EvaluatedCondition(this, false);

ListString sList = stringList.getList(element.getParent(), 
stream);
return new EvaluatedCondition(this, 
sList.contains(coveredText.toLowerCase()));
}

public StringListExpression getStringList() {
return stringList;
}
}


And the associated extension

public class CIInListConditionExtension implements IRutaConditionExtension {

private final String[] knownExtensions = new String[] { 
INSENSITIVEINLIST };

private final Class?[] extensions = new Class[] { 
CIInListCondition.class };

...

@Override
public AbstractRutaCondition createCondition(String name,
ListRutaExpression args) throws RutaParseException {
if (args != null  args.size() == 1) {
System.out.println(args.get(0).getClass().getName()); // prints 
org.apache.uima.ruta.expression.type.SimpleTypeExpression

System.out.println(((SimpleTypeExpression)args.get(0)).getTypeString()); // 
prints MonthsList
if (!(args.get(0) instanceof StringListExpression)) {
   
}

} else {
throw new RutaParseException(
INSENSITIVEINLIST accepts exactly a 
StringListExpression as arguments);
}
return new CIInListCondition((StringListExpression) args.get(0)); // 
It Fails here
}


And here's the stack trace :

java.lang.ClassCastException: 
org.apache.uima.ruta.expression.type.SimpleTypeExpression cannot be cast to 
org.apache.uima.ruta.expression.list.StringListExpression
at 
dictanova.genesis.textpreprocessing.ruta.CIInListConditionExtension.createCo
ndition(CIInListConditionExtension.java:68)

Regards,

Sebastian



Re: big offsets efficiency, and multiple offsets

2013-12-05 Thread Eddie Epstein
On 05/12/13 10:04, Jens Grivolla wrote:

 I agree that it might make more sense to model our needs more directly
 instead of trying to squeeze it into the schema we normally use for text
 processing.  But at the same time I would of course like to avoid having
 to reimplement many of the things that are already available when using
 AnnotationBase.

 For the cross-view indexing issue I was thinking of creating individual
 views for each modality and then a merged view that just contains a
 subset of annotations of each view, and on which we would do the
 cross-modal reasoning.

 I just looked again at the GaleMultiModalExample (not much there,
 unfortunately) and saw that e.g. AudioSpan derives from AnnotationBase
 but still has float values for begin/end.  I would be really interested
 in learning more about what was done in GALE, but it's hard to find any
 relevant information...


The readme at
http://svn.apache.org/repos/asf/uima/sandbox/trunk/GaleMultiModalExample/README.txtpoints
to two papers with more
details on the GALE multi-modal application.

A portion of the view model was like this:
   Audio view - sofaref to the audio data, which was passed in parallel to
   multiple ASR annotators. Each ASR annotator put it's transcription in
the view,
   where annotations contained ASR engine IDs.

   Transcription Views - a text sofa with transcription for an ASR output.
Annotations for each word referenced the lexeme annotations in the
audio view. Multiple MT annotators would receive each transcription
view and add their translations in the view.

Translation views - a text sofa with one of the translations, based on
a combination of ASR engine and MT engine. Annotations in a translation
view referenced the annotations in a transcription view.

There were more views. The points here are that 1) views were designed to
hold a particular SOFA to be processed by analytics appropriate for that
modality, 2) each derived view had cross references to the annotations
in views they were derived from, and 3) at the end the GUI presenting the
final translation could, for any word(s), show the particular piece of
transcription
it came from, and/or play the associated audio segment.

Eddie


Re: Problem writing ruta extensions

2013-12-05 Thread Peter Klügl
Am 05.12.2013 14:43, schrieb Sebastian:
 Peter Klügl pkluegl@... writes:

 Hi,

 Am 04.12.2013 18:33, schrieb Sebastian:
 Hi,

 I'm highly interested in ruta, and its potential applications in 
 industrial
 applications. Right know I'm trying to create a simple toy condition
 extension that is simply a case insensitive INLIST condition. It is
 completely based on the InListCondition class, I also declared an
 implementation of the IRutaConditionExtension interface.

 With primitve types everything seems to work great, except when the
 condition is used with a variable :

 STRINGLIST MonthsList = {january, ...};
 DECLARE Month;
 ANY{INSENSITIVEINLIST(MonthsList) - MARK(Month)};

 I get a class cast exception when the condition is being created, 
 because
 MonthsList is a SimpleTypeExpression and I'm expecting a 
 StringListExpression. 
 Am I doing something wrong ? I suppose there is a way to resolve the
 variable to the actual list, but I missed it somehow.

 It's hard to say what went wrong. My first guess would be that there is
 a problem in your extension. I just verified that INLIST works at all (I
 haven't used it myself for a long time).

 The example works with INLIST:

 STRINGLIST MonthsList = {january};
 DECLARE Month;
 ANY{INLIST(MonthsList) - MARK(Month)};

 Can you post the stacktrace of the exception? Or can you send me the
 source code of your extension (in case you do not want to post it on a
 public mailing list)?

 Anyways, the usage of INLIST makes only sense if you want to work on
 dynamic dictionaries that may change during rule execution. Have you
 taken a look at the MARKFAST or TRIE action?
 http://uima.apache.org/d/ruta-
 current/tools.ruta.book.html#ugr.tools.ruta.language.actions.markfast
 They already have options for case-insensitivity and are overall faster
 and more powerful.

 Best,

 Peter

 PS: You can, of course, also post a feature request on JIRA for adding a
 case-insensitivity to the INLIST condition 

 Any ideas on how that could be done?

 Regards

 Sebastian



 Hi Peter,
 Before giving the code, let me explain why I'm interested in a case 
 insensitive inlist.

 As far as I understand the behaviour of MARKFAST, it cannot be used with 
 more complex conditions than list containment. The problem with TRIE is that 
 it requires an external resource that is somewhat read from the file system, 
 whereas I'm interested in somehow embedding resources in jars and reading 
 them using classloader getResource capabilities (maybe I missed something 
 there too).
 But you are right, I don't need a dynamic dictionary :)

I normally just apply the dictionaries, which create some annotations of
the given type(s). Those types are then included in more complex
conditions in other rules. Most rule-based systems that I know have
outsourced dictionary matches since they are inefficient in predicates.

Both actions should also work with dictionaries in the classpath. If
not, I will fix it ASAP :-)


 Anyway, here's how I declared it :

Thanks. I will try to take a look at it. If not today, then tomorrow.

Best,

Peter


 public class CIInListCondition extends TerminalRutaCondition {

 private StringListExpression stringList;

 public CIInListCondition(StringListExpression list) {
 super();
 this.stringList = list;
 }

 @Override
 public EvaluatedCondition eval(AnnotationFS annotation,
 RuleElement element, RutaStream stream, InferenceCrowd crowd) {
 String coveredText = annotation.getCoveredText();
 if (StringUtils.isEmpty(coveredText))
 return new EvaluatedCondition(this, false);

 ListString sList = stringList.getList(element.getParent(), 
 stream);
 return new EvaluatedCondition(this, 
 sList.contains(coveredText.toLowerCase()));
 }

 public StringListExpression getStringList() {
 return stringList;
 }
 }


 And the associated extension

 public class CIInListConditionExtension implements IRutaConditionExtension {

 private final String[] knownExtensions = new String[] { 
 INSENSITIVEINLIST };

 private final Class?[] extensions = new Class[] { 
 CIInListCondition.class };

 ...

 @Override
 public AbstractRutaCondition createCondition(String name,
 ListRutaExpression args) throws RutaParseException {
 if (args != null  args.size() == 1) {
 System.out.println(args.get(0).getClass().getName()); // prints 
 org.apache.uima.ruta.expression.type.SimpleTypeExpression
 
 System.out.println(((SimpleTypeExpression)args.get(0)).getTypeString()); // 
 prints MonthsList
 if (!(args.get(0) instanceof StringListExpression)) {

 }

 } else {
 throw new RutaParseException(
 INSENSITIVEINLIST accepts exactly a 
 StringListExpression as arguments);
 }
 return new CIInListCondition((StringListExpression) args.get(0)); // 
 It Fails here

Re: Problem writing ruta extensions

2013-12-05 Thread Alexandre Patry

On 2013-12-04 12:33, Sebastian wrote:

Hi,

I'm highly interested in ruta, and its potential applications in industrial
applications. Right know I'm trying to create a simple toy condition
extension that is simply a case insensitive INLIST condition. It is
completely based on the InListCondition class, I also declared an
implementation of the IRutaConditionExtension interface.

With primitve types everything seems to work great, except when the
condition is used with a variable :

STRINGLIST MonthsList = {january, ...};
DECLARE Month;
ANY{INSENSITIVEINLIST(MonthsList) - MARK(Month)};

I get a class cast exception when the condition is being created, because
MonthsList is a SimpleTypeExpression and I'm expecting a StringListExpression.

Am I doing something wrong ? I suppose there is a way to resolve the
variable to the actual list, but I missed it somehow.
It may not help you to get your toy extension working, but for small 
lists I like to use regular expressions where case insensitiveness is free:


W{REGEXP((?i)january|february|march|...|december) - MARK(Month)}

Regards,

Alexandre

--
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com



Re: how to dynamically set a required annotation type from within a UIMAfit annotator?

2013-12-05 Thread Thomas Ginter
Renaud,

We (clinical NLP group at the University of Utah) have written a platform that 
sits on top of UIMA-AS that will allow you to dynamically assign and even 
generate types for annotation engines.  We have a whole family of annotators 
whose parameters are dynamic using this platform.  We are almost ready to 
release this as open source, though it is still probably another month or two 
out.  Until that time we are open to collaboration opportunities to wherein we 
give you access to the software and teach you how it is used.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




On Dec 5, 2013, at 3:43 AM, Richard Eckart de Castilho r...@apache.org wrote:

 To my knowledge, the capabilities are part of the descriptor which must be 
 available before the AE is initialized. You cannot retroactively change the
 descriptor of a method from within its initialize() method.
 
 It would be nice to have something like this, though. But that would also mean
 switching any flow controllers which use this information from a static 
 planning
 to a dynamic planning approach.
 
 How about filing a feature request against the UIMA framework?
 
 -- Richard
 
 On 05.12.2013, at 08:35, Renaud Richardet renaud.richar...@gmail.com wrote:
 
 I find it very convenient to add
 
 @TypeCapability(inputs = { TOKEN, SENTENCE, COOCCURRENCE })
 so that I can ensure that dependencies are met. But sometimes, the
 dependencies are dynamic (e.g. an input type capability is part of the
 config of an annotator, and is loaded dynamically, see code below).
 
 Is there a way to dynamically set a required annotation type from within a
 UIMAfit annotator? Something like:
 
   @Override
 
   public void initialize(UimaContext context)
 
   throws ResourceInitializationException {
 
   super.initialize(context);
 
   try {
 
   // loading annotation class dynamically
 
   requiredAnnotation= (Class? extends Annotation) Class.forName(
 org.uima.MyRequiredAnnotation);
 
   // adding it as TypeCapability's input
 
   context.getMetadata().addCapabilityInput(requiredAnnotation);
 
   } catch (Exception e) {
 
   throw new ResourceInitializationException(e);
 
   }
 
   }
 
 
 Thanks, Renaud