Marshall Schor (JIRA) wrote:
Space/Time tradeoffs in the CAS
-------------------------------

                 Key: UIMA-1089
                 URL: https://issues.apache.org/jira/browse/UIMA-1089
             Project: UIMA
          Issue Type: Improvement
          Components: Core Java Framework
    Affects Versions: 2.2.2
            Reporter: Marshall Schor
            Priority: Minor


Investigate / implement optimizations that trade user-controllable time (running the optimizations) 
for space.  One such optimization could be: sharing strings.  To do the sharing requires additional 
computation and (temporary) storage to detect the sharing opportunities, but results in space 
savings.  For instance, a common annotation might assign short strings like "noun" to a 
"part-of-speech" feature.  If you are processing a large document, there may be a large 
number of these kinds of string valued features, picked from a small pool of allowable values. The 
CAS's string storage might be able to be optimized to share the string references in this case, at 
a cost of temporarily creating a hash table of the unique strings and using it to identify sharing 
possibilities.  A new API call to do this optimization would isolate the performance/space overhead 
of doing this optimization to just those users and times where it makes sense to do this.

An alternative would be to automatically figure this out for some selected 
kinds of optimizations, but I'm not sure that could be done without impacting 
finely-tuned systems negatively.


Marshall,

I'm not sure what you're doing here.  Why don't you just
start discussion threads on the mailing list?  Why do these
things need to be in Jira?

--Thilo

Reply via email to