Space/Time tradeoffs in the CAS
-------------------------------

                 Key: UIMA-1089
                 URL: https://issues.apache.org/jira/browse/UIMA-1089
             Project: UIMA
          Issue Type: Improvement
          Components: Core Java Framework
    Affects Versions: 2.2.2
            Reporter: Marshall Schor
            Priority: Minor


Investigate / implement optimizations that trade user-controllable time 
(running the optimizations) for space.  One such optimization could be: sharing 
strings.  To do the sharing requires additional computation and (temporary) 
storage to detect the sharing opportunities, but results in space savings.  For 
instance, a common annotation might assign short strings like "noun" to a 
"part-of-speech" feature.  If you are processing a large document, there may be 
a large number of these kinds of string valued features, picked from a small 
pool of allowable values. The CAS's string storage might be able to be 
optimized to share the string references in this case, at a cost of temporarily 
creating a hash table of the unique strings and using it to identify sharing 
possibilities.  A new API call to do this optimization would isolate the 
performance/space overhead of doing this optimization to just those users and 
times where it makes sense to do this.

An alternative would be to automatically figure this out for some selected 
kinds of optimizations, but I'm not sure that could be done without impacting 
finely-tuned systems negatively.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to