Hi Carsten, please see https://github.com/m09/readability/blob/master/java/uima-corpus-creator/src/main/java/eu/crydee/readability/uima/corpuscreator/DictCreationPipeline.java#L200
for an example pipeline and https://github.com/m09/readability/blob/master/java/uima-corpus-creator/src/main/java/eu/crydee/readability/uima/corpuscreator/ae/RevisionsFilterAE.java for an example filter. This uses uimafit so you'll have to translate it in UIMA terms but it might be a starting point. Cheers, Hugo On 11/21/2014 11:15 AM, Carsten Schnober wrote: > Hi Sumit, > Thanks for your suggestion, it seems like the proper way to go for my > use case. However, I'm not too familiar with the UIMA internals, so > could you point me to where or how I can set the dropCasOnException option? > Thanks! > Carsten > > > Am 07.11.2014 um 10:19 schrieb Sumit Madan: >> Hi Carsten, >> >> I had this experience too that a flow controller is not easy to build. >> But may be you can use a workaroud. You can put a new AE in-between >> (BinaryCasReader and Segmenter). This AE would throw an exception when a >> (J)Cas doesn't fit your rules. With the UIMA options dropCasOnException >> and ActionOnMaxError, UIMA can drop those (J)Cases and go further with >> the wanted ones. >> >> Regards >> Sumit >> >> On 07/11/14 09:04, [email protected] wrote: >>> Hi Carsten, >>> >>> I've never used it, but according to the documentation you can do this >>> with a flow controller. The bad thing is, Richard told me a while ago >>> that it is not so easy to build your own flow controller. >>> >>> Cheers, >>> Armin >>> >>> -----Ursprüngliche Nachricht----- >>> Von: Carsten Schnober [mailto:[email protected]] >>> Gesendet: Donnerstag, 6. November 2014 14:55 >>> An: [email protected] >>> Betreff: Filter Cas from UIMA fit pipeline >>> >>> Hi, >>> I wonder whether there is a recommended way to remove certain (J)Cas' >>> (i.e. documents) from a pipeline after reading. >>> The scenario in my case is that I use a standard reader >>> (BinaryCasReader) which returns many documents. I only want a subset of >>> these documents to be processed by the following pipeline (comprising a >>> segmenter, a writer and some other engines), subject to a certain value >>> in a custom annotation. >>> >>> The initial intuition would be to use/implement a reader that only >>> selects those documents that fulfil the given condition. In my case that >>> would mean, however, that I'd need to implement a new Reader extending >>> the BinaryCasReader by the described functionality. From a high-level >>> view at least, this seems much more complicated than just removing >>> documents from the pipeline. >>> Can I avoid that effort somehow without breaking conventions? >>> >>> Thanks! >>> Carsten >>> >> >> >
