I'm exploring two options for high-performance communication of CAS data.
1. I c an't use RMI on a CAS, so I'm looking at lower-level utilities on which RMI is built, such as DataOutputStream. The idea would be to walk the CAS like the XmiCasSerializer does (iterate the FSIndexRepository's), and instead of calling SAX methods, call DataOutputStream to marshal the FeatureStructures. This looks like a lot of work. XmiCasSerializer is about 1400 lines. I'd have to define a binary format. 2. I discovered that there *is* a binary XML. See http://www.w3.org/XML/EXI . It's very small and fast. In their test cases, it is about 1/5 to 1/10th the size of XML, parsed 7X faster, generated 2.4X faster. It would be trivial to implement in UIMA too. EXI provides a SAX ContentHandler that could be used in XmiCasSerializer.serialize(). There are two implementations, both in Java--an open-source one from Siemens, and a commercial one from AgileDelta. Unfortunately, the open-source one uses the GPL license, which my company doesn't allow me to use. If only they used the Apache license... This licensing issue may force me toward the DataOutputStream option. :-( Does anyone have any thoughts on these options or any other options? Thanks, Greg Holmberg
