Hi Eric,

The picture is beginning to sink in :)

On 9/25/07, Eric Vachon <[EMAIL PROTECTED]> wrote:
>
> Hi Eddie,
>
> The problem here is that one file does not mean one document.
>
> Imagine that the each pointer points to a zip containing 2000 documents.
> How do you populate not one but 2000 CAS from each pointer ? If you are
> restrained by a CAS Pool, then it's ok, otherwise you will create too
> many new CAS and you will exceed the allowed memory.
>
> We need a component that has one CAS in input and that can output any
> number of CAS (the CASMultiplier) but with a limitation so that at one
> time the number of CAS is not above a certain threshold.


This is exactly what the CAS Multiplier is designed to do: one CAS in, any
number of new CASes returned before processing of the input CAS is done. See
the CM example
$UIMA_HOME/examples/src/org/apache/uima/examples/casMultiplier/SimpleTextSegmenter.java.


The CPE CAS pool does not determine how many CASes can be created by a
collection reader, rather it sets the upper limit on how many CASes can be
active at the same time. The situation is similar for CAS multipliers, but
without the collection processing manager, what enables multiple CASes to be
processed at the same time?  That is the idea behind the UIMA JMS layer
discussed in uima-dev.

My other concern about the CAS Pool is that if I have multiple CAS
> Multipliers in a CPE that is launched over 5 pipelines it will become
> difficult to have a reasonable CAS Pool size that prevent locks by lack
> of CAS and that does not use too much memory.


Unfortunately the collection processing manager does not recognize the CM
component. A CM can be used within an aggregate AE, but the new CASes
created must be processed within the aggregate and cannot be returned.

Please keep the discussion going!
Eddie

Reply via email to