You could also consider UIMA-AS. If you have a multiple annotator pipeline it can run each annotator in a separate thread and so have multiple CASes active in the same pipeline. Individual annotators can be scaled out to get further speedups. You can specify an aggregate that starts with a collection reader and ends with a cas consumer and then adjust the deployment descriptor to get the appropriate CAS pool size and scaleout values.
Burn.
