Re: Asynchronous UIMA (workflow) ?

Thomas Francart Fri, 12 Oct 2007 02:32:20 -0700

Pascal, Greg -

Building 2 separate processes was also the first idea I had in mind; this can work for one user validation in the workflow, but does not scales well if multiple user validations are required.
Another use-case of "asynchronous" analysis engines could be for processes that take a very long time (maybe like an inference engine that would infer against a large knowledge base).

It would be nice if I could, say, define my flow in a single aggregate AE, then tell that one special step is asynchronous (which would involve saving the CAS and the engine state before calling the asynchronous AE); Then, I could imagine that when writing this special AE, I could use some kind of key that the UIMA framework gave me, store that key, and then, when I want to resume processing, return an updated CAS along with this key, which would basically tell the framework which state to reload, and at which point to resume.

I think having a human intervention somewhere along the workflow is a requirement in many projects, considering that NLP, as Greg said, is "imprecise", not to say "error-prone".

Cheers,
Thomas

Thomas Francart
Mondeca
3, cité Nollez 75018 Paris France
Tel: +33 (0)1 44 92 35 04 - Fax: +33 (0)1 44 92 02 59
Blog: mondeca.wordpress.com
Web: www.mondeca.com
Mail: [EMAIL PROTECTED]

[EMAIL PROTECTED] a écrit :

Pascal--


I was thinking essentially the same thing: serialize the CAS to a file or database, do your human interaction (possibly including the CAS Editor), then reload it and resume processing.

It would be nice to generalize it, rather than have two explicit analysis engines.  So a nice enhancement to UIMA would be the ability to persist not just the CAS but the state of the engine along with it, so that it could be stopped and restarted at any point.

For my purposes, this would be useful if say, one annotator depended on finding certain data in the CAS from another annotator, but that earlier one failed or didn't produce the right data, and I need a user to produce the data manually.

For example, if a taxonomy classifier runs first and a named entity extractor runs second, and the entity extractor wants to select a name catalog to use based on the classification ("if classified biology, use biology NC, else if classified chemistry use chemistry NC"), but the classifier doesn't classify at all, or doesn't classify into the right catefgory (not biology or chemisty), then I would want the user to classify it manually.  So I would persist that document and engine state, notify the user, who would classify it, and then restart the engine, which would then move on to run the entity extractor with an NC based on the user's classification.

Not knowing in advance where in the engine the failure will occur (failure to classify being only one possibility), I can't create two explicit engines.  Having a general mechanism to persist the state of the engine would let me handle any failure or missing dependency.  NLP being generally an imprecise process, I foresee human intervention in the pipeline as a not-infrequent occurance.  So having a mechanism to deal with that in a general way would be helpful.

This is not a high priority enhancement for me at the moment, just an idea for us to kick around.


Greg Holmberg


 -------------- Original message ----------------------
From: "Pascal Coupet" <[EMAIL PROTECTED]>

Hi Thomas,

 

I think a way to do it is to split this process across 2 workflows. The first 
consumer will get the CAS, eventually store it in XML somewhere (file, database 
...). A small application will manage the interaction with the user (sending 
mail, reminders ...), watch a return address mailbox, update the XCAS and make 
it available. The source of the second workflow will watch for available updated 
XCAS and continue from there. You can in theory make the consumer of the first 
workflow to send the mail and the source of the second watch for incoming emails 
but it will be more difficult I think to manage properly the interaction with 
users (reminder to responds, statistics, routing configuration ...). 

 

Just some thoughts,  

 

Pascal

________________________________

From: Thomas Francart [mailto:[EMAIL PROTECTED]] 
Sent: Thursday, October 11, 2007 7:01 AM
To: [email protected]
Subject: Asynchronous UIMA (workflow) ?

 


Hi all -

I'm thinking about whether or not it would be possible to add an asynchronous 
step in a UIMA pipeline ? For example having an analysis engine that would ask 
for a user input or a user review of a CAS, or something like that. Well my 
point is that at some point in the pipeline, I would like a user to review the 
state of the CAS, maybe add some more information, delete some others, and so 
on; and then the rest of the pipeline would continue upon user validation. (by 
"user" here I don't mean someone that sits in front of a computer and watch the 
UIMA processing taking place, but maybe someone receiving an email saying "hey, 
you should have a look and validate that").

I know this a generic workflow question, but I was just wondering if some other 
people had the same question/requirements with a UIMA integration, and if you 
had some ideas on how it could be adressed/solved.

Best,
Thomas

-- 

Thomas Francart 
Mondeca 
3, cité Nollez 75018 Paris France 
Tel: +33 (0)1 44 92 35 04 - Fax: +33 (0)1 44 92 02 59 
Blog: mondeca.wordpress.com 
Web: www.mondeca.com 
Mail: [EMAIL PROTECTED]

Re: Asynchronous UIMA (workflow) ?

Reply via email to