Still pretty new to this, but our pipeline will likely include at least one annotator written in C++ using the UIMA C++ API. My understanding (from https://uima.apache.org/doc-uimacpp-huh.html) was that there are issues (particularly “2. Runtime problems in the C++ code can crash the entire JVM process.”) with invoking a C++ annotator from a JVM process via JNI. We were hoping to avoid that with UIMA-AS, but my understanding of both UIMA-AS and Hadoop is limited, so you’re question may very well be a good one.
David On 9/15/17, 3:15 PM, "Richard Eckart de Castilho" <[email protected]> wrote: >If you have a Hadoop/Spark/YARN cluster, why would you use UIMA-AS? > >Afaik UIMA-AS is usually used to run UIMA components as statically >deployed services that communicate with each other via a message queue. > >I suppose in a Hadoop/Spark/YARN cluster you'd care more about dynamic >deployment and instead of a message queue I suppose you'd use RDDs, no? > >Cheers, > >-- Richard > >On 15.09.2017, at 20:54, Fox, David <[email protected]> wrote: >> >> We¹re looking to transition a NLP large application processing >>~30TB/month >> from a custom NLP framework to UIMA-AS, and from parallel processing on >>a >> dedicated cluster with custom python scripts which call gnu parallel, to >> something with better support for managing resources on a shared >>cluster. >> >> Both our internal IT/engineering group and our cluster vendor >> (HortonWorks) use and support Hadoop/Spark/YARN on a new shared cluster. >> DUCC¹s capabilities seem to overlap with these more general purpose >>tools. >> Although it may be more closely aligned with UIMA for a dedicated >> cluster, I think the big question for us would be how/whether it would >> play nicely with other Hadoop/Spark/YARN jobs on the shared cluster. >> We¹re also likely to move at least some of our workload to a cloud >> computing host, and it seems like Hadoop/Spark are much more likely to >>be >> supported there. > > This e-mail, including attachments, may include confidential and/or proprietary information, and may be used only by the person or entity to which it is addressed. If the reader of this e-mail is not the intended recipient or his or her authorized agent, the reader is hereby notified that any dissemination, distribution or copying of this e-mail is prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and delete this e-mail immediately.
