I'd like to point out also that the best UIMA-AS documentation is actually
not where one might first go looking (in docs, html, or pdf files) but rather
the README file at the top level of the UIMA-AS distribution. That's where
to find the good stuff. :)
On 4/27/2012 3:47 PM, Thomas Ginter wrote:
UIMA-AS was created to handle the message passing, job distribution, etc. Try
going through the UIMA-AS documentation first. We have had pretty good success
using it here.
Thanks,
Thomas Ginter
801-448-7676
[email protected]
On Apr 27, 2012, at 1:35 PM, John David Osborne wrote:
Hello,
Is there any best practice documentation out there for running
UIMA/UIMA-AS on a cluster? I have only run single machine instances of
UIMA (mostly through Eclipse) and have not investigated the ability to
perform multiple simultaneous analyses in order to process large document
collections.
It's not clear to me how UIMA would operate in a cluster environment, do
people really do message passing using JMI? I'm guessing this is the case
as I seeing references to MPICH, SGE or other things I am more used to.
I've looked through some of the documentation (including all the Overview
& SDK setup) but am not finding anything helpful. I've also tried googling
but I am not getting much except this:
http://comments.gmane.org/gmane.comp.apache.uima.general/2131 which makes
me think it is possible.
Currently with my level of confusion I think it may be best to have
multiple instances of UIMA on a cluster and just submit jobs processing
discrete document sets to our SGE cluster and ignore whatever scaling
features are actually present in UIMA since the document processing I plan
to do is data parallel.
-John
--
Eric Riebling Senior Systems Programmer
http://ericriebling.com CMU Language Technologies Institute