date:20170420

Re: Synchonizing Batches AE and StatusCallbackListener

2017-04-20 Thread Erik Fäßler

Hi Jerry,

thanks a lot for your answer! I’m sorry that I didn’t make myself clearer. I 
will try again! :-)
Here comes a lot of text, sorry for that. The post actually has two parts: The 
first explaining my issue, the second responding to the pointer to UIMA-AS.

First: Yes, I use a CPE. I process text documents. Tens of millions of them.
So, I have the following components to my issue, running with the CPE.

1. A CAS-Consumer (just an AnalysisEngine internally, of course). This consumer 
is responsible to serialise the document CAS into XMI and send the XMI to a 
database. It is a XMI-to-database consumer. For performance reasons, the XMI of 
multiple CASes is buffered and then sent as a batch, lets say, 50 CAS XMIs at a 
time.
2. A CPE StatusCallbackListener which also writes to the same database, but in 
another table. It logs into the database which documents have been successfully 
processed by the CPE. It also works on a batch basis.

The goal: The CallbackListener should only mark those documents as successfully 
processed (i.e. as “finished”) where the CAS-Consumer actually has sent the XMI 
data to the database.

Reason: I don’t want documents marked as “finished” where the XMI data is not 
in the database but still in the CAS buffer. Because when now the pipeline 
crashes, the XMI data never gets sent to the database. Then, the processing 
state is inconsistent: Documents that have not been written into the database 
are marked as successfully processed. But their data is missing.

Also, not each XMI data is stored. There is a condition in the consumer to 
decide whether the XMI is to be stored or not. Thus, I cannot “create 
consistency” by checking which XMI made it into the database.

Is this better understandable?

Regarding UIMA-AS:

I tried it out a few years back when it was rather new, UIMA 2.3.1 or 
something. Back then, it was like the following:
1. Install a broker (or something - ActiveMQ was it called?)
2. Configure it.
3. Start it.
4. For each AE you want to use, deploy the AE on some server in your cluster 
(multiple AEs can be bundled into an AAE).
5. Start a reader process that will then fill the broker queue.
6. Wait until processing is finished.
7. Stop all the AE services deployed to the cluster, if you want to save the 
resources.
8. Stop the broker.

This was quite a while back so perhaps this is not exactly how it was. But it 
seemed overly complex to me. I had to login into each server where I wanted 
work to be done. We have like 20 nodes or something. Perhaps I could write a 
script for that, but then I would have to keep track of the servers that are 
free to use at a current time. Because I am not the only one using the cluster.
And then I have to stop all AE “services”. Until then, they will use memory 
because they just idle when there is nothing more to do.

In contrast, CPEs are self-contained projects in my case which I can distribute 
easily through our job system (SLURM).

I thought all the setup for UIMA-AS would pay out in better performance. But in 
my - admittedly limited - tests there was not much of a performance difference. 
CPEs seemed to be a bit faster due to the lack of CAS serialization between 
reader and AEs.

Of course, this was years in the past. Is the process a bit simpler today? Or 
perhaps I got it wrong to begin with, that’s possible. But I read the 
documentation back then and couldn’t see how to do things much simpler.

BUT if CPEs can’t solve my issue and UIMA-AS can, then perhaps I will try it 
again.

Another question: You said “CPE was replaced by UIMA-AS”. Does that mean that 
CPEs will eventually be removed from UIMA? Are they still a part of UIMA 3?

Sorry for all the text!

Best regards and thanks!

Erik

> On 20 Apr 2017, at 20:31, Jaroslaw Cwiklik  wrote:
> 
> Hi Erik, sorry for a delay responding to your question. This seems like a
> CPE question is this right? I am not quite following what is the issue you
> are running into. Could you explain this better? With a clearer problem
> description perhaps others will jump in with an answer  :)
> 
> Just FYI, the CPE was replaced by the UIMA-AS quite a long time ago.
> Perhaps UIMA-AS can work better for you. You can read about it here:
> https://uima.apache.org/d/uima-as-2.9.0/uima_async_scaleout.html
> 
> Jerry
> UIMA Team
> 
> On Tue, Apr 18, 2017 at 5:56 AM, Erik Fäßler 
> wrote:
> 
>> Hi all,
>> 
>> I have a use case where a consumer of mine sends CAS XMI data into a
>> database in batchProcessComplete(). I also use a StatusCallbackListener
>> that logs into the database whether a document has been completed
>> processing, this is also done batch wise.
>> Now the issue is, if the pipeline crashes for any reason, I must start
>> over because the “completion” flag from the CallbackListener and the data
>> actually sent by the XMI consumer is not synchronised, i.e. I don’t know if
>> the data has actually been sent for a document that has completed
>> processing because everyth

Re: Synchonizing Batches AE and StatusCallbackListener

2017-04-20 Thread Jaroslaw Cwiklik

Hi Erik, sorry for a delay responding to your question. This seems like a
CPE question is this right? I am not quite following what is the issue you
are running into. Could you explain this better? With a clearer problem
description perhaps others will jump in with an answer  :)

Just FYI, the CPE was replaced by the UIMA-AS quite a long time ago.
Perhaps UIMA-AS can work better for you. You can read about it here:
https://uima.apache.org/d/uima-as-2.9.0/uima_async_scaleout.html

Jerry
UIMA Team

On Tue, Apr 18, 2017 at 5:56 AM, Erik Fäßler 
wrote:

> Hi all,
>
> I have a use case where a consumer of mine sends CAS XMI data into a
> database in batchProcessComplete(). I also use a StatusCallbackListener
> that logs into the database whether a document has been completed
> processing, this is also done batch wise.
> Now the issue is, if the pipeline crashes for any reason, I must start
> over because the “completion” flag from the CallbackListener and the data
> actually sent by the XMI consumer is not synchronised, i.e. I don’t know if
> the data has actually been sent for a document that has completed
> processing because everything is done batch-wise and not immediately for
> performance reasons. I also cannot just look into the database which XMI
> data is there because it only gets sent on a met condition.
>
> I would like to somehow communicate between the consumer and the
> CallbackListener to send their data for the same documents in agreement. Is
> there anything I can do to achieve this?
>
> Best,
>
> Erik

Re: CAS visual debugger works in eclipse but not in the binary

2017-04-20 Thread Marshall Schor

Hi Benedict,

Although it's hard to spot, this too looks like an out of memory problem.  Can
you try adding the -Xmx parameter to however you're launching this to give Java
more memory to work with?

-Marshall

On 4/18/2017 12:54 PM, Benedict Holland wrote:
> Hello All,
>
> I am attempting to integrate the OpenNLP application with the UIMA
> framework. I created the PEAR file successfully.
>
> When I run the UIMA Pear Installer from eclipse, it works. When I attempt
> to run the runPearInstaller.bat file, it fails.
>
> When I run the CAS Visual Debugger from cvd.bat, I get the error below.
> When I run it from eclipse, everything works.
>
> Any idea about how to proceed?
>
> Thanks,
> ~Ben
>
> Error:
>
> 12:54:05.375 - 16:
> org.apache.uima.util.SimpleResourceFactory.produceResource: CONFIG: trying
> Resource class
> org.apache.uima.analysis_engine.impl.PearAnalysisEngineWrapper
> 12:54:05.410 - 16:
> org.apache.uima.analysis_engine.impl.PearAnalysisEngineWrapper.createRM:
> CONFIG: UIMA pear runtime set classpath to
> "F:/nlp/installed_pear/opennlp.uima.OpenNlpTextAnalyzer/lib/jwnl.jar;F:/nlp/installed_pear/opennlp.uima.OpenNlpTextAnalyzer/lib/opennlp-maxent.jar;F:/nlp/installed_pear/opennlp.uima.OpenNlpTextAnalyzer/bin;F:/nlp/installed_pear/opennlp.uima.OpenNlpTextAnalyzer/lib/opennlp-tools.jar;F:/nlp/installed_pear/opennlp.uima.OpenNlpTextAnalyzer/lib/opennlp-uima.jar"
> for UIMA component opennlp.uima.OpenNlpTextAnalyzer.
> 12:54:05.410 - 16:
> org.apache.uima.analysis_engine.impl.PearAnalysisEngineWrapper.createRM:
> CONFIG: UIMA pear runtime set datapath to
> "F:/nlp/installed_pear/opennlp.uima.OpenNlpTextAnalyzer/models" for UIMA
> component opennlp.uima.OpenNlpTextAnalyzer.
> 12:54:05.411 - 16:
> org.apache.uima.analysis_engine.impl.PearAnalysisEngineWrapper.createRMmap:
> CONFIG: UIMA pear runtime: creating a Map from class
> "F:/nlp/installed_pear/opennlp.uima.OpenNlpTextAnalyzer/lib/jwnl.jar;F:/nlp/installed_pear/opennlp.uima.OpenNlpTextAnalyzer/lib/opennlp-maxent.jar;F:/nlp/installed_pear/opennlp.uima.OpenNlpTextAnalyzer/bin;F:/nlp/installed_pear/opennlp.uima.OpenNlpTextAnalyzer/lib/opennlp-tools.jar;F:/nlp/installed_pear/opennlp.uima.OpenNlpTextAnalyzer/lib/opennlp-uima.jar"
> and data path
> "F:/nlp/installed_pear/opennlp.uima.OpenNlpTextAnalyzer/models" to its
> resource manager instance.
> 12:54:05.549 - 16:
> org.apache.uima.util.SimpleResourceFactory.produceResource: CONFIG: trying
> Resource class org.apache.uima.resource.impl.DataResource_impl
> 12:54:05.578 - 16:
> org.apache.uima.util.SimpleResourceFactory.produceResource: CONFIG: trying
> Resource class org.apache.uima.resource.impl.DataResource_impl
> 12:54:05.625 - 16:
> org.apache.uima.util.SimpleResourceFactory.produceResource: CONFIG: trying
> Resource class org.apache.uima.resource.impl.DataResource_impl
> 12:54:06.339 - 16:
> org.apache.uima.util.SimpleResourceFactory.produceResource: CONFIG: trying
> Resource class org.apache.uima.resource.impl.DataResource_impl
> 12:54:07.02 - 16:
> org.apache.uima.util.SimpleResourceFactory.produceResource: CONFIG: trying
> Resource class org.apache.uima.resource.impl.DataResource_impl
> 12:54:07.391 - 16:
> org.apache.uima.util.SimpleResourceFactory.produceResource: CONFIG: trying
> Resource class org.apache.uima.resource.impl.DataResource_impl
> 12:54:07.860 - 16:
> org.apache.uima.util.SimpleResourceFactory.produceResource: CONFIG: trying
> Resource class org.apache.uima.resource.impl.DataResource_impl
> 12:54:09.99 - 16:
> org.apache.uima.util.SimpleResourceFactory.produceResource: CONFIG: trying
> Resource class org.apache.uima.resource.impl.DataResource_impl
> 12:54:09.544 - 16:
> org.apache.uima.util.SimpleResourceFactory.produceResource: CONFIG: trying
> Resource class org.apache.uima.resource.impl.DataResource_impl
> 12:54:09.875 - 16:
> org.apache.uima.util.SimpleResourceFactory.produceResource: CONFIG: trying
> Resource class org.apache.uima.resource.impl.DataResource_impl
> 12:54:10.365 - 16:
> org.apache.uima.util.SimpleResourceFactory.produceResource: CONFIG: trying
> Resource class org.apache.uima.resource.impl.DataResource_impl
> 12:54:11.992 - 16:
> org.apache.uima.util.SimpleResourceFactory.produceResource: CONFIG: trying
> Resource class org.apache.uima.resource.impl.DataResource_impl
> 12:54:37.527 - 16:
> org.apache.uima.tools.cvd.MainFrame.handleException(526): SEVERE: Error
> initializing
> "org.apache.uima.analysis_engine.impl.PearAnalysisEngineWrapper" from
> descriptor
> file:/F:/nlp/installed_pear/opennlp.uima.OpenNlpTextAnalyzer/opennlp.uima.OpenNlpTextAnalyzer_pear.xml.
> org.apache.uima.resource.ResourceInitializationException: Error
> initializing
> "org.apache.uima.analysis_engine.impl.PearAnalysisEngineWrapper" from
> descriptor
> file:/F:/nlp/installed_pear/opennlp.uima.OpenNlpTextAnalyzer/opennlp.uima.OpenNlpTextAnalyzer_pear.xml.
> at
> org.apache.uima.util.SimpleResourceFactory.produceResource(SimpleResourceFactory.java:144)
> at
> org.apache.uima.impl.C

Re: Error running PEAR Installer

2017-04-20 Thread Marshall Schor

Please try running with more memory by using the java command line parameter 
-Xmx

See for example the documentation for this launching parameter, on this page

https://docs.oracle.com/javase/8/docs/technotes/tools/windows/java.html

-Marshall


On 4/17/2017 5:20 PM, Benedict Holland wrote:
> Hello all,
>
> I get this error when I run the pear installer using the built results from
> the OpenNLP application. Is there anything I can do?
>
> Thanks,
> ~Ben
>
> Verification of opennlp.uima.OpenNlpTextAnalyzer failed =>
>  java.lang.OutOfMemoryError: GC overhead limit exceeded
> at java.io.DataInputStream.readUTF(Unknown Source)
> at java.io.DataInputStream.readUTF(Unknown Source)
> at
> opennlp.tools.ml.model.BinaryFileDataReader.readUTF(BinaryFileDataReader.java:59)
> at
> opennlp.tools.ml.model.AbstractModelReader.readUTF(AbstractModelReader.java:80)
> at
> opennlp.tools.ml.model.AbstractModelReader.getPredicates(AbstractModelReader.java:117)
> at
> opennlp.tools.ml.maxent.io.GISModelReader.constructModel(GISModelReader.java:77)
> at
> opennlp.tools.ml.model.GenericModelReader.constructModel(GenericModelReader.java:62)
> at
> opennlp.tools.ml.model.AbstractModelReader.getModel(AbstractModelReader.java:85)
> at
> opennlp.tools.util.model.GenericModelSerializer.create(GenericModelSerializer.java:32)
> at
> opennlp.tools.util.model.GenericModelSerializer.create(GenericModelSerializer.java:29)
> at
> opennlp.tools.util.model.BaseModel.finishLoadingArtifacts(BaseModel.java:309)
> at opennlp.tools.util.model.BaseModel.loadModel(BaseModel.java:239)
> at opennlp.tools.util.model.BaseModel.(BaseModel.java:173)
> at opennlp.tools.parser.ParserModel.(ParserModel.java:177)
> at
> opennlp.uima.parser.ParserModelResourceImpl.loadModel(ParserModelResourceImpl.java:35)
> at
> opennlp.uima.parser.ParserModelResourceImpl.loadModel(ParserModelResourceImpl.java:26)
> at
> opennlp.uima.util.AbstractModelResource.load(AbstractModelResource.java:35)
> at
> org.apache.uima.resource.impl.ResourceManager_impl.registerResource(ResourceManager_impl.java:750)
> at
> org.apache.uima.resource.impl.ResourceManager_impl.initializeExternalResources(ResourceManager_impl.java:594)
> at
> org.apache.uima.resource.Resource_ImplBase.initialize(Resource_ImplBase.java:210)
> at
> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:157)
> at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:128)
> at
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
> at
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
> at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
> at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:331)
> at
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:448)
> at
> org.apache.uima.pear.tools.InstallationTester.testAnalysisEngine(InstallationTester.java:218)
> at
> org.apache.uima.pear.tools.InstallationTester.doTest(InstallationTester.java:113)
> at
> org.apache.uima.pear.tools.InstallationController.verifyComponentInstallation(InstallationController.java:1110)
> at
> org.apache.uima.pear.tools.InstallationController.verifyComponent(InstallationController.java:1993)
> at
> org.apache.uima.tools.pear.install.InstallPear.installPear(InstallPear.java:389)
>

Re: DUCC Java API job info

2017-04-20 Thread Daniel Baumartz


Thanks for the json url. An addition to the DuccJobMonitor would be great!

-Daniel

On 19.04.2017 20:22, Lou DeGenaro wrote:

There is no CLI query that gives the desired results.  One could write a
python script to visit
http://uima-ducc-demo.apache.org:42133/ducc-servlet/json-format-aaData-jobs
and parse the resulting json.

Should there be a Jira to provide a CLI query capability equivalent of the
Jobs page? Either a new query or the DuccJobMonitor could be enhanced.

Lou.



On Wed, Apr 19, 2017 at 10:26 AM, Daniel Baumartz <
bauma...@stud.uni-frankfurt.de> wrote:


Hi,

I am trying to get the job info (start, duration and time until
completion, basically the data from the web server jobs page) with the DUCC
Java API. I was able to get some of the data (total, done...) using
DuccJobMonitor, but I can't figure out how to get the others. Is there a
way to access these?

Thanks,
Daniel

Re: Synchonizing Batches AE and StatusCallbackListener

Re: Synchonizing Batches AE and StatusCallbackListener

Re: CAS visual debugger works in eclipse but not in the binary

Re: Error running PEAR Installer

Re: DUCC Java API job info

5 matches

Site Navigation

Mail list logo

Footer information