RE: Asynchronous UIMA (workflow) ?

Pascal Coupet Fri, 12 Oct 2007 06:53:42 -0700

Hi Greg,

I agree with you that human intervention may be often needed in NLP related 
applications. In an editorial system by example, you may want a review and 
validation of categories assigned automatically. However, I'm not sure that 
this should be done within an UIMA pipeline. The UIMA framework is a middleware 
and is not the whole application. It looks difficult to me to manage that if 10 
docs go into a pipeline, 9 will go through at a normal pace and one will get 
stuck somewhere 1 or 2 days for manual intervention. The framework is 
distributed and relies on timeouts to detect errors. You will have to do 
something special to not fail in error for this document and hope that the user 
will not forget to do the job.


If I go back to the editorial system example, it may require getting the 
document quickly within the system even if some annotators did fail on it. The 
application can then make some decisions depending on the missing parts (ask an 
editor to complete, hide the document ...)

One way to handle errors is simply to store them within the CAS. Subsequent 
annotators can make decisions depending on previous errors. In your example, No 
entity extraction will be make because no category is available and the 
annotator will log an error "unable to extract entities..." which is different 
than finding no entity. The application receiving the CAS at the end of the 
workflow will propose it to an editor who will select categories and then 
resubmit the document to the annotation workflow to get it completed. 

I think that the whole purpose of the UIMA framework is to glue together 
various annotation engines and manage properly to distribution of the work 
across machines. One workflow can be seen as a meta annotator which has 
business meaning to your company or research center. It can be ideally reused 
by different applications. So I will try to avoid as much as possible to have 
application specific actions directly encoded within it. 

 
Pascal
 
 

Pascal Coupet
Chief Technology Officer & Co-founder

TEMIS INC
1518 Walnut Street, suite 1702, Philadelphia, PA 19102, USA
 Tel:   +1 215 732 2549 ext 112 
Mob:   +1 215 609 2514
Fax:   +1 215 732 0490 
www.temis.com
 

Strictly Personal and Confidential
This message may contain confidential and proprietary material for the sole use 
of the intended recipient. Any review or distribution by or to others is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender and delete all copies.
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 11, 2007 1:18 PM
To: [email protected]; [email protected]
Cc: Pascal Coupet
Subject: RE: Asynchronous UIMA (workflow) ?

Pascal--


I was thinking essentially the same thing: serialize the CAS to a file or 
database, do your human interaction (possibly including the CAS Editor), then 
reload it and resume processing.

It would be nice to generalize it, rather than have two explicit analysis 
engines.  So a nice enhancement to UIMA would be the ability to persist not 
just the CAS but the state of the engine along with it, so that it could be 
stopped and restarted at any point.

For my purposes, this would be useful if say, one annotator depended on finding 
certain data in the CAS from another annotator, but that earlier one failed or 
didn't produce the right data, and I need a user to produce the data manually.

For example, if a taxonomy classifier runs first and a named entity extractor 
runs second, and the entity extractor wants to select a name catalog to use 
based on the classification ("if classified biology, use biology NC, else if 
classified chemistry use chemistry NC"), but the classifier doesn't classify at 
all, or doesn't classify into the right catefgory (not biology or chemisty), 
then I would want the user to classify it manually.  So I would persist that 
document and engine state, notify the user, who would classify it, and then 
restart the engine, which would then move on to run the entity extractor with 
an NC based on the user's classification.

Not knowing in advance where in the engine the failure will occur (failure to 
classify being only one possibility), I can't create two explicit engines.  
Having a general mechanism to persist the state of the engine would let me 
handle any failure or missing dependency.  NLP being generally an imprecise 
process, I foresee human intervention in the pipeline as a not-infrequent 
occurance.  So having a mechanism to deal with that in a general way would be 
helpful.

This is not a high priority enhancement for me at the moment, just an idea for 
us to kick around.


Greg Holmberg


 -------------- Original message ----------------------
From: "Pascal Coupet" <[EMAIL PROTECTED]>
> Hi Thomas,
> 
>  
> 
> I think a way to do it is to split this process across 2 workflows. The first 
> consumer will get the CAS, eventually store it in XML somewhere (file, 
> database 
> ...). A small application will manage the interaction with the user (sending 
> mail, reminders ...), watch a return address mailbox, update the XCAS and 
> make 
> it available. The source of the second workflow will watch for available 
> updated 
> XCAS and continue from there. You can in theory make the consumer of the 
> first 
> workflow to send the mail and the source of the second watch for incoming 
> emails 
> but it will be more difficult I think to manage properly the interaction with 
> users (reminder to responds, statistics, routing configuration ...). 
> 
>  
> 
> Just some thoughts,  
> 
>  
> 
> Pascal
> 
> ________________________________
> 
> From: Thomas Francart [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, October 11, 2007 7:01 AM
> To: [email protected]
> Subject: Asynchronous UIMA (workflow) ?
> 
>  
> 
> 
> Hi all -
> 
> I'm thinking about whether or not it would be possible to add an asynchronous 
> step in a UIMA pipeline ? For example having an analysis engine that would 
> ask 
> for a user input or a user review of a CAS, or something like that. Well my 
> point is that at some point in the pipeline, I would like a user to review 
> the 
> state of the CAS, maybe add some more information, delete some others, and so 
> on; and then the rest of the pipeline would continue upon user validation. 
> (by 
> "user" here I don't mean someone that sits in front of a computer and watch 
> the 
> UIMA processing taking place, but maybe someone receiving an email saying 
> "hey, 
> you should have a look and validate that").
> 
> I know this a generic workflow question, but I was just wondering if some 
> other 
> people had the same question/requirements with a UIMA integration, and if you 
> had some ideas on how it could be adressed/solved.
> 
> Best,
> Thomas
> 
> -- 
> 
> Thomas Francart 
> Mondeca 
> 3, cité Nollez 75018 Paris France 
> Tel: +33 (0)1 44 92 35 04 - Fax: +33 (0)1 44 92 02 59 
> Blog: mondeca.wordpress.com 
> Web: www.mondeca.com 
> Mail: [EMAIL PROTECTED] 
> 
>

RE: Asynchronous UIMA (workflow) ?

Reply via email to