CAS serializationWithCompression

D. Heinze Mon, 11 Jan 2016 18:07:11 -0800

I'm having a problem with CAS serializationWithCompression.  I am processing
a few million text document on an IBM P8 with 16 physical SMTP 8 cpus, 200GB
RAM, Ubuntu 14.04.3 LTS and IBM Java 1.8.


I run 55 UIMA pipelines concurrently.  I'm using UIMA 2.6.0.

I use serializeWithCompression to save the final state of the processing on
each document to a file for later processing.

However, the size of the serialized CAS just keeps growing.  The size of the
CAS is stable, but the serialized CASes just keep getting bigger. I even
went to creating a new CAS for each process instead of using cas.reset().  I
have also tried writing the serialized CAS to a byte array output stream
first and then to a file, but it is the serializeWithCompression that caused
the size problem not writing the file.

Here's what the code looks like.  Flushing or not flushing does not make a
difference.  Closing or not closing the file output strem does not make a
difference (other than leaking memory).  I've also tried doing
serializeWithCompression with type filtering.  Wanted to try using a Marker,
but cannot see how to do that.  The problem exists regardless of doing 1 or
55 pipelines concurrently.

 

        File fout = new File(documentPath);

        fos = new FileOutputStream(fout);

        org.apache.uima.cas.impl.Serialization.serializeWithCompression(
cas, fos);

        fos.flush();

        fos.close();

        logger.info( "serializedCas size " + cas.size() + " ToFile " +
documentPath);

 

Suggestions will be appreciated.

 

Thanks / Dan

CAS serializationWithCompression

Reply via email to