Bhavani Iyer wrote:
We're planning to start  the implementation to support Delta CAS as
described here:
http://cwiki.apache.org/UIMA/reducing-overhead-for-remote-service-calls.html

The current thinking on the design is described below and we would like some
feedback.
*CAS Activity Journal*
In order to be able to export as XMI only the updates to the CAS, we need to
maintain a journal associated with the CAS to track these update activities. The journal will contain the following information:

1) To identify new FSs, the max FS id at the start of processing is saved.
New FS added would have ids above this high water mark.
2) To identify which pre-existing FSs were modified,, a list of ids of
pre-existing FSs that have been changed.
3)  To track updates to the Views, a list of FS ids added, removed and
reindexed in the index repository of pre-existing Views.
I think of the View here as referring to the indexes. The remote AE may not have the full set of indexes the client is using. So no optimization can be reliably done based on analyzing changes to determine if they affect an index.
The CAS APIs  that set feature values as well as APIs that add or remove
from the index repository will be modified to update the CAS activity journal.
The overhead is expected to be be minimal since it will simply add an FS id
to the appropriate list as mentioned above.
There will be both time and space overhead for this, of course; the space overhead might not be minimal. Note that there are a lot of variations of the APIs to the CAS including so-called "low-level" ones, all of which need to be found and modified.
*Delta CAS XMI serialization*
A more compact representation of the delta CAS data than the proposed
XMI:Difference format would be preferred for transmitting CAS data when making 
calls to a remote
service. Instead, we propose that CAS updates  be serialized in the same format 
as
the XMI CAS with one modification. Additional attributes will be defined in the 
View element to contain the
list of ids of FSs that were added, deleted and reindexed:

Example of current cas:View
<cas:View sofa="1" members="8 13 20 26 42"/>

Example of proposed cas:View to support delta CAS
<cas:View sofa="1" members_added="32"  members_deleted="13 20"
members_reindexed="26 42" />

XMI deserialization of the delta CAS  will update the CAS as follows:
1) create new FS for those elements where xmi:id is above the high water
mark.
2) update the FSs feature values for those elements where xmi:id is below
the high water mark. Note, features missing in the XMI will be set to null.
Not sure what this note means. I guess it implies that the server must make sure to send back all features (including those *not* defined in its type system) for a FS, unless the feature value is null. Is this also true for other values, like 0 for numbers?

-Marshall
3) Process the view element to add, remove or reindex in the specified VIew
index repository.

Please note that this proposed XMI representation itself does not identify
or mark the CAS as a delta CAS. In the context of UIMA AS services, additional 
properties in the request and
reply messages will specify that the XMI contains a delta CAS.
Applications should not use the API to export a delta CAS to a file for
later processing without taking additional steps to retain the format 
information.


*Delta CAS for debugging*
To support debugging UIMA aggregates as described here
 http://cwiki.apache.org/UIMA/improving-uima-debug-capabilities.html
the delta CAS implementation will be extended as follows:
1) the CAS activity journal will be maintained for each component that is
called during aggregate processing.
2) an API to enable/disable this extended journaling by component.
3) define the XMI representation of the CAS activity journal. Details on
this will be posted shortly.

We would appreciate comments and suggestions on the proposed changes.

Thanks,

Bhavani


Reply via email to