I'm exploring two options for high-performance communication of CAS data. 





1. I c an't use RMI on a CAS, so I'm looking at lower-level utilities on which 
RMI is built, such as DataOutputStream. 



The idea would be to walk the CAS like the XmiCasSerializer does (iterate the 
FSIndexRepository's), and instead of calling SAX methods, call DataOutputStream 
to marshal the FeatureStructures. 



This looks like a lot of work.  XmiCasSerializer is about 1400 lines.  I'd have 
to define a binary format. 





2. I discovered that there *is* a binary XML.  See http://www.w3.org/XML/EXI . 



It's very small and fast.  In their test cases, it is about 1/5 to 1/10th the 
size of XML, parsed 7X faster, generated 2.4X faster. 



It would be trivial to implement in UIMA too.  EXI provides a SAX 
ContentHandler that could be used in XmiCasSerializer.serialize(). 



There are two implementations, both in Java--an open-source one from Siemens, 
and a commercial one from AgileDelta. 



Unfortunately, the open-source one uses the GPL license, which my company 
doesn't allow me to use.  If only they used the Apache license... 



This licensing issue may force me toward the DataOutputStream option. :-( 





Does anyone have any thoughts on these options or any other options? 



Thanks, 





Greg Holmberg 

Reply via email to