Marshall Schor wrote:
Can you verify that the instPear.getComponentPearDescPath() doesn't also
include the collection reader? (You can include Collection Readers
inside an Aggregate).
instPear.getComponentPearDescPath() does include the
Collection Reader Descritpor xml file. Is that an issue ?
In my case it would be the right thing to include the Collection
Reader in the Aggregate. The reason I didn't do it is that
I thought it is not possible.
The Analysis Engine interface does not provide a hasNext method.
How can I now drive the processing pipeline without knowing when
to stop ?
The 2nd thing I see is in the stack trace where it looks like the code
that does the call to process the CAS is running descended from some
thread pooling, concurrent execution stuff (Here's the top of the stack
trace up to the point where the process call is:
at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:218)
at
dk.infopaq.trainserver.TrainingEngine$TrainingJob.run(TrainingEngine.java:229)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Is it possible there are multiple threads running, and if so is the
TrainingJob code set up to be thread safe?
Yes the code is usually executed concurrently. There are multiple
PEARs and each PEAR can have exactly own thread which runs it,
but the code makes sure that it cannot happen that two threads
try to process the same PEAR, because I think it would not be safe
to install the same PEAR concurrently into the same directory.
In the test I did I do not think that the reason is a concurrency
problem, because it also fails if I only have one thread, and it always
fails on the second time I run it.
Here's another observation.
The code is separately instantiating a collection reader and an
Aggregate. The Aggregate is created (produceAnalysisEngine) with a
resource manager that has no extra class paths specified.
The collection reader is instantiated with a resource manager that has
extra class paths instantiated:
rsrcMgr = UIMAFramework.newDefaultResourceManager();
rsrcMgr.setExtensionClassPath(instPear.getComponentPearDescPath() + ":"
+ instPear.buildComponentClassPath(), false);
It seems to me that maybe the same resource manager should be used for
both? or is there a reason for doing it this way?
Your are right they should be used for both, I changed my code.
Another observation: One thing UIMA does with the type system from all
the components is to merge the type system specifications. There are
APIs to do this manually, if you are loading different descriptors. I
see this isn't being done here though. You do send the Aggregate's type
system to the collection reader component, though - so this would work
unless some special stuff was going on with JCas cover types (but maybe
your collection reader isn't using JCas).
No, I do not use JCas. That issue does not longer exist when
I include the Collection Reader into the Aggregate, right ?
Last observation: the code that sends the type system to the collection
reader gets a new CAS (ae.newCAS() ), but doesn't use that CAS for the
subsequent call to process, but instead gets another newCAS().
Thanks, also changed that, now I only have on CAS object which is used
all the time for precessing.
Thanks,
Jörn