If you have a Hadoop/Spark/YARN cluster, why would you use UIMA-AS? Afaik UIMA-AS is usually used to run UIMA components as statically deployed services that communicate with each other via a message queue.
I suppose in a Hadoop/Spark/YARN cluster you'd care more about dynamic deployment and instead of a message queue I suppose you'd use RDDs, no? Cheers, -- Richard On 15.09.2017, at 20:54, Fox, David <[email protected]> wrote: > > We¹re looking to transition a NLP large application processing ~30TB/month > from a custom NLP framework to UIMA-AS, and from parallel processing on a > dedicated cluster with custom python scripts which call gnu parallel, to > something with better support for managing resources on a shared cluster. > > Both our internal IT/engineering group and our cluster vendor > (HortonWorks) use and support Hadoop/Spark/YARN on a new shared cluster. > DUCC¹s capabilities seem to overlap with these more general purpose tools. > Although it may be more closely aligned with UIMA for a dedicated > cluster, I think the big question for us would be how/whether it would > play nicely with other Hadoop/Spark/YARN jobs on the shared cluster. > We¹re also likely to move at least some of our workload to a cloud > computing host, and it seems like Hadoop/Spark are much more likely to be > supported there.
