Re: UIMA AS 2.4.2 -> Listener.onBeforeMessageSend(UimaASProcessStatus status)

2014-01-23 Thread Jaroslaw Cwiklik
No concrete date but its a matter of weeks not months.

Jerry


On Thu, Jan 23, 2014 at 4:41 PM, RYAN C. CORNIA wrote:

> Thanks Jerry.
>
> Yes, I was expecting the status to have a blank CAS or some other
> difference, which it currently does not.
>
> Any idea on when 2.5.0 will be out?
> Ryan
>
>
> On 1/23/14, 2:37 PM, "Jaroslaw Cwiklik"  wrote:
>
> >The UIMA-AS client code calls onBeforeMessageSend() for both Process and
> >CPC requests. Its just a confirmation the request was delivered to a
> >queue.
> >From what I see while trying to replicate the scenario is that on CPC the
> >status object (passed in to onBeforeMessageSend() )contains a reference to
> >the last CAS which is clearly a bug. I will create JIRA for this and fix
> >it
> >in 2.5.0.
> >
> >Thanks for bringing this up.
> >
> >Jerry
> >
> >
> >
> >On Thu, Jan 23, 2014 at 3:58 PM, RYAN C. CORNIA
> >wrote:
> >
> >> We¹ve been using UIMA AS 2.4.0, with a listener that counts CASes as
> >>they
> >> are sent via the Listener.onBeforeMessageSend(UimaASProcessStatus
> >>status)
> >> method.
> >>
> >> We then compare the count with the received count in
> >> collectionProcessComplete(EntityProcessStatus aStatus) to make sure the
> >> listener has received all of the CASes it sent before exiting.
> >>Otherwise,
> >> if the listener is slow, a collectionProcessComplete message can be
> >> received before the final entityProcessComplete() method is called for
> >>the
> >> last CASes.
> >>
> >> This worked in 2.4.0, but in 2.4.2, I am finding the
> >> onBeforeMessageSend(status) method is called once for ever CAS (as it
> >> should be),but then one additional time on the last CAS. So, my count is
> >> off because I send 12 CAS, but the counter registers 13 when the
> >> onBeforeMessageSend(status) is called twice on the last document.
> >>
> >> Any ideas why it would be called twice on the last document for a
> >> listener? It is a change in 2.4.2 that was not there in 2.4.0.
> >>
> >> Thanks,
> >> Ryan
> >>
> >>
>
>


Re: UIMA AS 2.4.2 -> Listener.onBeforeMessageSend(UimaASProcessStatus status)

2014-01-23 Thread RYAN C. CORNIA
Thanks Jerry.

Yes, I was expecting the status to have a blank CAS or some other
difference, which it currently does not.

Any idea on when 2.5.0 will be out?
Ryan


On 1/23/14, 2:37 PM, "Jaroslaw Cwiklik"  wrote:

>The UIMA-AS client code calls onBeforeMessageSend() for both Process and
>CPC requests. Its just a confirmation the request was delivered to a
>queue.
>From what I see while trying to replicate the scenario is that on CPC the
>status object (passed in to onBeforeMessageSend() )contains a reference to
>the last CAS which is clearly a bug. I will create JIRA for this and fix
>it
>in 2.5.0.
>
>Thanks for bringing this up.
>
>Jerry
>
>
>
>On Thu, Jan 23, 2014 at 3:58 PM, RYAN C. CORNIA
>wrote:
>
>> We¹ve been using UIMA AS 2.4.0, with a listener that counts CASes as
>>they
>> are sent via the Listener.onBeforeMessageSend(UimaASProcessStatus
>>status)
>> method.
>>
>> We then compare the count with the received count in
>> collectionProcessComplete(EntityProcessStatus aStatus) to make sure the
>> listener has received all of the CASes it sent before exiting.
>>Otherwise,
>> if the listener is slow, a collectionProcessComplete message can be
>> received before the final entityProcessComplete() method is called for
>>the
>> last CASes.
>>
>> This worked in 2.4.0, but in 2.4.2, I am finding the
>> onBeforeMessageSend(status) method is called once for ever CAS (as it
>> should be),but then one additional time on the last CAS. So, my count is
>> off because I send 12 CAS, but the counter registers 13 when the
>> onBeforeMessageSend(status) is called twice on the last document.
>>
>> Any ideas why it would be called twice on the last document for a
>> listener? It is a change in 2.4.2 that was not there in 2.4.0.
>>
>> Thanks,
>> Ryan
>>
>>



Re: UIMA AS 2.4.2 -> Listener.onBeforeMessageSend(UimaASProcessStatus status)

2014-01-23 Thread Jaroslaw Cwiklik
The UIMA-AS client code calls onBeforeMessageSend() for both Process and
CPC requests. Its just a confirmation the request was delivered to a queue.
>From what I see while trying to replicate the scenario is that on CPC the
status object (passed in to onBeforeMessageSend() )contains a reference to
the last CAS which is clearly a bug. I will create JIRA for this and fix it
in 2.5.0.

Thanks for bringing this up.

Jerry



On Thu, Jan 23, 2014 at 3:58 PM, RYAN C. CORNIA wrote:

> We’ve been using UIMA AS 2.4.0, with a listener that counts CASes as they
> are sent via the Listener.onBeforeMessageSend(UimaASProcessStatus status)
> method.
>
> We then compare the count with the received count in
> collectionProcessComplete(EntityProcessStatus aStatus) to make sure the
> listener has received all of the CASes it sent before exiting. Otherwise,
> if the listener is slow, a collectionProcessComplete message can be
> received before the final entityProcessComplete() method is called for the
> last CASes.
>
> This worked in 2.4.0, but in 2.4.2, I am finding the
> onBeforeMessageSend(status) method is called once for ever CAS (as it
> should be),but then one additional time on the last CAS. So, my count is
> off because I send 12 CAS, but the counter registers 13 when the
> onBeforeMessageSend(status) is called twice on the last document.
>
> Any ideas why it would be called twice on the last document for a
> listener? It is a change in 2.4.2 that was not there in 2.4.0.
>
> Thanks,
> Ryan
>
>


UIMA AS 2.4.2 -> Listener.onBeforeMessageSend(UimaASProcessStatus status)

2014-01-23 Thread RYAN C. CORNIA
We’ve been using UIMA AS 2.4.0, with a listener that counts CASes as they are 
sent via the Listener.onBeforeMessageSend(UimaASProcessStatus status) method.

We then compare the count with the received count in 
collectionProcessComplete(EntityProcessStatus aStatus) to make sure the 
listener has received all of the CASes it sent before exiting. Otherwise, if 
the listener is slow, a collectionProcessComplete message can be received 
before the final entityProcessComplete() method is called for the last CASes.

This worked in 2.4.0, but in 2.4.2, I am finding the 
onBeforeMessageSend(status) method is called once for ever CAS (as it should 
be),but then one additional time on the last CAS. So, my count is off because I 
send 12 CAS, but the counter registers 13 when the onBeforeMessageSend(status) 
is called twice on the last document.

Any ideas why it would be called twice on the last document for a listener? It 
is a change in 2.4.2 that was not there in 2.4.0.

Thanks,
Ryan



Re: uima-fit and uima annotators (in my case Whitespace annotator)

2014-01-23 Thread Richard Eckart de Castilho
Thanks. Here are some more specific tips:

You can specify all engines in the call to runPipeline - no need for the 
AggregateBuilder
unless you need to do sofa mappings.

SimplePipeline.runPipeline(reader, preparationEngine, whitespaceEngine, 
casConsumer));

Parameter constants typically begin with "PARAM_" instead of ending in 
"_PARAM". That makes a difference if you ever plan to use the 
uimafit-maven-plugin to automatically generate descriptors from your AEs, 
because it uses prefixes to detect parameter name constants.

uimaFIT should be able to automatically coerce single values into multi-valued 
parameters. So it should be possible to write this

AnalysisEngineFactory.createEngineDescription(WhitespaceTokenizer.class,
"SofaNames", SimpleParserAE.SOFA_NAME_TEXT_ONLY);

Cheers,

-- Richard

On 23.01.2014, at 14:45, Luca Foppiano  wrote:

> On Thu, Jan 23, 2014 at 3:13 PM, Richard Eckart de Castilho
> wrote:
> 
>> Hi,
>> 
>> Hi Richard,
> 
> 
>> can you provide the full code for your sample pipeline? I think that would
>> make it easier to help.
>> 
> 
> Sure, is located here: https://github.com/lfoppiano/uima-fit-sample-pipeline
> 
> 
>> With the present information, I can only give some general advice.
>> 
>> [...]
> 
>> 
>> I would recommend using the CAS/CasUtil only if you want to implement a
>> generic component that can be configured to work with different types. If
>> your component is fixed to a certain type system, then using the
>> JCas/JCasUtil is much more convenient.
>> 
> 
> Thanks a lot for your input, in fact it shed some light1 around type
> systems.
> 
> Regards
> -- 
> Luca Foppiano
> 
> Software Engineer
> +31615253280
> l...@foppiano.org
> www.foppiano.org



Re: uima-fit and uima annotators (in my case Whitespace annotator)

2014-01-23 Thread Luca Foppiano
On Thu, Jan 23, 2014 at 3:13 PM, Richard Eckart de Castilho
wrote:

> Hi,
>
> Hi Richard,


> can you provide the full code for your sample pipeline? I think that would
> make it easier to help.
>

Sure, is located here: https://github.com/lfoppiano/uima-fit-sample-pipeline


> With the present information, I can only give some general advice.
>
> [...]

>
> I would recommend using the CAS/CasUtil only if you want to implement a
> generic component that can be configured to work with different types. If
> your component is fixed to a certain type system, then using the
> JCas/JCasUtil is much more convenient.
>

Thanks a lot for your input, in fact it shed some light around type
systems.

Regards
-- 
Luca Foppiano

Software Engineer
+31615253280
l...@foppiano.org
www.foppiano.org


Re: uima-as 2.3.1 - java.io.IOException: Frame size of 147 MB larger than max allowed 100 MB

2014-01-23 Thread Thomas Ginter
1.  Your annotators can remove as well as add annotations.  Perhaps if there is 
a large number of annotations that you don’t really need you could have a clean 
up annotator that removes the extra stuff, or else just don’t generate it in 
the first place, whatever works best for your algorithm.
2.  Remote services in your pipeline are serialized the same way as the 
serialization with the client.  In fact the framework essentially creates a 
client interface for sending and receiving CAS objects and then passing them 
to/from your pipeline.  It is likely then that your expansion is happening 
after the remote service is called or else is not yet big enough to be over the 
100MB limit.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




On Jan 23, 2014, at 12:53 AM, Mihaela M  wrote:

> 1. I will upgrade uima-as and review the annotations gathered in the CAS, but 
> is it a way to have the CAS reset before sending it to the client? In my case 
> I only want to get the status of the processing, not all the annotations 
> found, because they were handled by the consumers configured in the pipeline 
> anyway.
> 
> 2. Do you know whether the aggregates communicate with the clients the same 
> as with the remote CAS consumers? I wonder why it did not complain while 
> sending the exploded CAS to the remote consumer, but it did when 
> communicating with the client.
> 
> Thank you!
> Mihaela
> 
> 
> 
> On Wednesday, January 22, 2014 7:07 PM, Thomas Ginter 
>  wrote:
> 
> Mihaela,
> 
> There are two things that you should probably do in order to get started with 
> these issues.
> 
> 1.  Upgrade to UIMA-AS 2.4.2 which uses a newer version of ActiveMQ and 
> contains numerous bug fixes for UIMA-AS related to how the JMS queues are 
> handled.
> 2.  The UIMA-AS framework adds very little as far as overhead space for the 
> CAS objects which means the vast majority of the size expansion from 48KB to 
> 147MB is coming from annotations/metadata being added by your service.  
> Increasing the frame size in ActiveMQ may allow your CAS objects to be 
> transferred in JMS but it is more important to find out what is causing this 
> dramatic expansion and whether or not the service can be written differently 
> so that the expansion is much smaller.
> 
> Thanks,
> 
> Thomas Ginter
> 801-448-7676
> thomas.gin...@utah.edu
> 
> 
> 
> 
> 
> On Jan 22, 2014, at 9:44 AM, Mihaela M  wrote:
> 
>> Hello,
>> 
>> I have a uima pipeline that uses uima-as 2.3.1 which has one aggregator with 
>> one local annotator, one remote consumer and one remote annotator. It 
>> actually has more components but I will get into exactly the configuration 
>> only if needed.
>> I have developed also a UIMA client for it using class: 
>> UimaAsynchronousEngine, method sendCas (async as far I understood) and a 
>> callback listener that waits for the processing to complete.
>> 
>> 1. I have noticed that the CAS returned, in general is quite big. Is it a 
>> way to send, at least to the client, a CAS that does not contain all the 
>> types that the various annotators added? When could I remove those things 
>> from the CAS?
>> 2. I send a text message for processing which has 48 KB - it gets processed 
>> successfully by the pipeline, but the pipeline fails to send a reply to the 
>> client. The exception that I get is:
>> 
>> 01/21/2014 07:36:02.978 [ActiveMQ Transport:
>> tcp://localhost/127.0.0.1:61616] [DEBUG] 
>> org.apache.activemq.ActiveMQConnection
>> - Async exception with no exception listener: java.io.IOException: Frame size
>> of 147 MB larger than max allowed 100 MB
>> java.io.IOException: Frame size of 147 MB larger than max
>> allowed 100 MB
>>  at
>> org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:277)
>> ~[activemq-core-5.6.0.jar:5.6.0]
>>  at
>> org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:229)
>> ~[activemq-core-5.6.0.jar:5.6.0]
>>  at
>> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:221)
>> ~[activemq-core-5.6.0.jar:5.6.0]
>>  at
>> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:204)
>> ~[activemq-core-5.6.0.jar:5.6.0]
>>  at
>> java.lang.Thread.run(Thread.java:662) [na:1.6.0_30]
>> 01/21/2014 07:36:03.093 [ActiveMQ Connection Executor:
>> tcp://localhost/127.0.0.1:61616] [DEBUG]
>> org.apache.activemq.transport.tcp.TcpTransport - Stopping transport
>> tcp://localhost/127.0.0.1:61616
>> 
>> As far as I understood, the client connects via JMS to the uima pipeline and 
>> a temporary reply queue gets created where the reply from the pipeline 
>> should be sent and then consumed by the client. After the above exception is 
>> thrown, the connection to the pipeline gets closed and automatically the 
>> temp queue gets deleted hence the client does not receive anymore the reply.
>> 
>> I am wondering why the error I was mentioning is not thrown

Re: uima-fit and uima annotators (in my case Whitespace annotator)

2014-01-23 Thread Richard Eckart de Castilho
Hi,

can you provide the full code for your sample pipeline? I think that would make 
it easier to help.

With the present information, I can only give some general advice.

- it is not mandatory to have the type system java classes (JCas wrappers) 
present in a project if none of your components (Readers, AEs, CCs) use them.

- it is possible to manually load a type system description (TSD) and pass it 
to the components. But then the TSD is the second argument to the 
createXXXDescription call, e.g.

  createEngineDescription(SimpleCC.class, tsd, 
SimpleCC.PARAM_OUTPUT_DIR, "…");

- the type systems of all components in a pipeline is automatically merged when 
a pipeline is run (e.g. using SimplePipeline.runPipeline). Thus, it would also 
work to pass a TSD with all types used in the pipeline only to the reader, but 
not to any of the subsequent components.

- alternatively, it is possible to have uimaFIT automatically detect your types 
[1]. If you do that, there is no need at all to pass the TSD to the component - 
it happens automatically.

  createEngineDescription(SimpleCC.class,
SimpleCC.PARAM_OUTPUT_DIR, "…");

- if you want to retrieve annotation from the CAS without using the JCas 
wrappers, you can have a look at the CasUtil class. E.g.

  CasUtil.select(cas, CasUtil.getType(cas, "my.package.name.MyType"))

Mind, this call works only if "MyType" inherits from the built-in "Annotation" 
type. Otherwise, you would use "selectFS" instead of "select".

I would recommend using the CAS/CasUtil only if you want to implement a generic 
component that can be configured to work with different types. If your 
component is fixed to a certain type system, then using the JCas/JCasUtil is 
much more convenient.

-- Richard

[1] 
http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.typesystem


On 23.01.2014, at 06:21, Luca Foppiano  wrote:

> Hi Everybody,
>I'm starting playing with uima-fit and I'm trying to integrate the
> whitespace annotator into my simple pipeline composed by a collection
> reader a simple AE (plays with the text, doesn't annotate) and I want to
> add a whitespace annotator to be applied to the text.
> 
> I've download the trunk version of the Whitespace annotator on github, I've
> extracted the type system definition from the descriptor XML and referenced
> it from uimafit. The pipeline worked without crashing.
> 
> Now I want to add an AE that takes the annotations and do something with
> that (print them for example).
> 
> I could not find a way to work around the fact the type system java class
> were not present in the project, is this a mandatory requirement?
> 
> What I've tried is to do something like:
> 
> //Get the type autogeneated type system (SentenceAnnotation,
> TokenAnnotation)
> TypeDescription[] types = tsd.getTypes();
> 
> [...]
> //..and try to pass them to my annotator
>AnalysisEngineDescription casConsumer =
> AnalysisEngineFactory.createEngineDescription(SimpleCC.class,
>SimpleCC.OUTPUT_DIR_PARAM,
>"/home/lf84914/development/epo/apl/data/out",
> *types, null*);
> 
> but then, in the AE's code, I have no idea how to use them.
> 
> Any suggestions?
> 
> Thank everybody in advance.
> -- 
> Luca Foppiano
> 
> Software Engineer
> +31615253280
> l...@foppiano.org
> www.foppiano.org



uima-fit and uima annotators (in my case Whitespace annotator)

2014-01-23 Thread Luca Foppiano
Hi Everybody,
I'm starting playing with uima-fit and I'm trying to integrate the
whitespace annotator into my simple pipeline composed by a collection
reader a simple AE (plays with the text, doesn't annotate) and I want to
add a whitespace annotator to be applied to the text.

I've download the trunk version of the Whitespace annotator on github, I've
extracted the type system definition from the descriptor XML and referenced
it from uimafit. The pipeline worked without crashing.

Now I want to add an AE that takes the annotations and do something with
that (print them for example).

I could not find a way to work around the fact the type system java class
were not present in the project, is this a mandatory requirement?

What I've tried is to do something like:

//Get the type autogeneated type system (SentenceAnnotation,
TokenAnnotation)
TypeDescription[] types = tsd.getTypes();

[...]
//..and try to pass them to my annotator
AnalysisEngineDescription casConsumer =
AnalysisEngineFactory.createEngineDescription(SimpleCC.class,
SimpleCC.OUTPUT_DIR_PARAM,
"/home/lf84914/development/epo/apl/data/out",
*types, null*);

but then, in the AE's code, I have no idea how to use them.

Any suggestions?

Thank everybody in advance.
-- 
Luca Foppiano

Software Engineer
+31615253280
l...@foppiano.org
www.foppiano.org