Thanks Massimo. Great idea indeed. However in my current scenario, I cannot use a database. So I think as of now I should stick with the idea to store them in zipped format :)

*Samudra Banerjee*
First Year Graduate Student
Department of Computer Science
State University of New York
Stony Brook, NY 11790
631-496-6939

On 2/2/2014 1:48 AM, Massimo Nicosia wrote:
An alternative could be serializing the XML content into a String and
saving it in a database or a fast key-value store.

I have used code like this:

             ByteArrayOutputStream out = new ByteArrayOutputStream(1024);
             XmiCasSerializer ser = new
XmiCasSerializer(cas.getTypeSystem());
             ser.serialize(cas.getCas(), (new XMLSerializer(out,
false)).getContentHandler());
             out.close();
             String xmlContent = out.toString();

Best,
Massimo



On Sun, Feb 2, 2014 at 4:22 AM, Samudra Banerjee <[email protected]> wrote:

Hi Experts,

I have a scenario where processing a wikipedia XML dump generates a huge
number of JCas objects (~1 million), one per page. I want to serialize
these JCas objects for later use, but generating 1 million different files
will take a toll on the system. So I was wondering if there was a way to
serialize multiple JCas objects to a single file for later retrieval. Any
idea if this can be achieved?

Thanks and Regards,
Samudra
--

*Samudra Banerjee*
First Year Graduate Student
Department of Computer Science
State University of New York
Stony Brook, NY 11790
631-496-6939



Reply via email to