The cTAKES Default Processing Pipeline requires about a minimum of 3G of RAM due to the size of the embedded HSQLDBs (that is the default). However, providing a fair bit of overhead is generally a good idea.
As for multi-threading, I have been using the ThreadSafeLvg class. Per the component-use guide: "cTAKES was not originally designed to be thread safe. If you would like to experiment with making it thread safe, see class ThreadSafeLvg in ctakes-lvg." https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+Component+Use+Guide Mike On Thu, Jan 31, 2019 at 8:44 AM Baas,Leah <leah.b...@sanfordhealth.org> wrote: > Hi everyone, > > > > I’m setting up a new VM to process ~13,500 files using the cTAKES default > clinical pipeline. Looking for a good heuristic for RAM/CPU allocation. Can > cTAKES take advantage of multiple CPUs? Or is it designed only to utilize a > single thread? > > > > Thanks for the help. > > > > Leah > > ----------------------------------------------------------------------- > Confidentiality Notice: This e-mail message, including any attachments, > is for the sole use of the intended recipient(s) and may contain > privileged and confidential information. Any unauthorized review, use, > disclosure or distribution is prohibited. If you are not the intended > recipient, please contact the sender by reply e-mail and destroy > all copies of the original message. > -- [image: MetiStream Logo - 500] Mike Trepanier| Senior Big Data Engineer | MetiStream, Inc. | m...@metistream.com | 845 - 270 - 3129 (m) | www.metistream.com