There's a binary serialization format built into UIMA already,
public static void serializeCAS(CAS cas, OutputStream ostream) {
that is in the org.apache.uima.cas.impl.Serialization package.
This method is typically several times faster than XMI serialization,
depending on CAS content. The binary format eliminates XML parsing,
but still has to extract the arbitrary object model contents of a CAS
starting from the indexes, and reconstruct the CAS indexes on deserialization.
One drawback of binary serialization is that the client and service sides
must have exactly the same type system, as binary type and feature
codes are used. Like XMI serialization, the binary format also supports
delta CAS replies.
Eddie
On Wed, Jan 27, 2010 at 3:00 AM, Greg Holmberg <[email protected]> wrote:
>
>
> I'm exploring two options for high-performance communication of CAS data.
>
>
>
>
>
> 1. I c an't use RMI on a CAS, so I'm looking at lower-level utilities on
> which RMI is built, such as DataOutputStream.
>
>
>
> The idea would be to walk the CAS like the XmiCasSerializer does (iterate the
> FSIndexRepository's), and instead of calling SAX methods, call
> DataOutputStream to marshal the FeatureStructures.
>
>
>
> This looks like a lot of work. XmiCasSerializer is about 1400 lines. I'd
> have to define a binary format.
>
>
>
>
>
> 2. I discovered that there *is* a binary XML. See http://www.w3.org/XML/EXI .
>
>
>
> It's very small and fast. In their test cases, it is about 1/5 to 1/10th the
> size of XML, parsed 7X faster, generated 2.4X faster.
>
>
>
> It would be trivial to implement in UIMA too. EXI provides a SAX
> ContentHandler that could be used in XmiCasSerializer.serialize().
>
>
>
> There are two implementations, both in Java--an open-source one from Siemens,
> and a commercial one from AgileDelta.
>
>
>
> Unfortunately, the open-source one uses the GPL license, which my company
> doesn't allow me to use. If only they used the Apache license...
>
>
>
> This licensing issue may force me toward the DataOutputStream option. :-(
>
>
>
>
>
> Does anyone have any thoughts on these options or any other options?
>
>
>
> Thanks,
>
>
>
>
>
> Greg Holmberg
>
>