Space/Time tradeoffs in the CAS
-------------------------------
Key: UIMA-1089
URL: https://issues.apache.org/jira/browse/UIMA-1089
Project: UIMA
Issue Type: Improvement
Components: Core Java Framework
Affects Versions: 2.2.2
Reporter: Marshall Schor
Priority: Minor
Investigate / implement optimizations that trade user-controllable time
(running the optimizations) for space. One such optimization could be: sharing
strings. To do the sharing requires additional computation and (temporary)
storage to detect the sharing opportunities, but results in space savings. For
instance, a common annotation might assign short strings like "noun" to a
"part-of-speech" feature. If you are processing a large document, there may be
a large number of these kinds of string valued features, picked from a small
pool of allowable values. The CAS's string storage might be able to be
optimized to share the string references in this case, at a cost of temporarily
creating a hash table of the unique strings and using it to identify sharing
possibilities. A new API call to do this optimization would isolate the
performance/space overhead of doing this optimization to just those users and
times where it makes sense to do this.
An alternative would be to automatically figure this out for some selected
kinds of optimizations, but I'm not sure that could be done without impacting
finely-tuned systems negatively.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.