On Wed, Jan 27, 2010 at 9:38 PM, Greg Holmberg <[email protected]> wrote: > I did try the binary XML standard, EXI. I tried the open-source > implementation, EXIficient, from Siemens. See > http://exificient.sourceforge.net > > This plugged into XmiCasSerializer pretty easily, and after fixing a few null- > pointer exceptions in EXIficient, I got some output. This turned out to be > 30% > the size of the XML file for my small test (1029 bytes vs. 3396). I haven't > measured performance yet. > > Here's what I did to plug EXIficient in: > > EXIFactory exiFactory = DefaultEXIFactory.newInstance(); > exiFactory.setCodingMode(CodingMode.COMPRESSION); > EXIResult exiResult = new EXIResult(outputStream, exiFactory); > ContentHandler handler = exiResult.getHandler(); > XmiCasSerializer serializer = new XmiCasSerializer(jcas.getTypeSystem > ()); > serializer.serialize(jcas.getCas(), handler); > > I haven't tried to actually read the file, so I don't know that the data is > correct yet. > > I've submitted a patch to the EXIficient project for the null pointer > exceptions. > > More testing is required, but it looks pretty good so far. If it doesn't > work, > I would have to try to do something similar with java.io.DataOutputStream, > which seems like a lot of work--basically implementing something similar to > EXI. > > Any thoughts on going in this direction (EXI)? Can you think of any > alternatives (where the recipient is Java, but not running UIMA)? >
You might be interested to know that this is basically the same approach used by UIMA's Vinci library. It implements its own binary XML format, and plugs into the XmiCasSerializer by providing a ContentHandler. And we have used it in the past to connect to services that did not run the UIMA framework. Not that I'm suggesting you use Vinci instead. If Vinci can do CAS serialization I don't see why EXI couldn't, and it makes a lot of sense to use the standard. -Adam
