Thanks Massimo. Great idea indeed. However in my current scenario, I
cannot use a database. So I think as of now I should stick with the idea
to store them in zipped format :)
*Samudra Banerjee*
First Year Graduate Student
Department of Computer Science
State University of New York
Stony Brook, NY 11790
631-496-6939
On 2/2/2014 1:48 AM, Massimo Nicosia wrote:
An alternative could be serializing the XML content into a String and
saving it in a database or a fast key-value store.
I have used code like this:
ByteArrayOutputStream out = new ByteArrayOutputStream(1024);
XmiCasSerializer ser = new
XmiCasSerializer(cas.getTypeSystem());
ser.serialize(cas.getCas(), (new XMLSerializer(out,
false)).getContentHandler());
out.close();
String xmlContent = out.toString();
Best,
Massimo
On Sun, Feb 2, 2014 at 4:22 AM, Samudra Banerjee <[email protected]> wrote:
Hi Experts,
I have a scenario where processing a wikipedia XML dump generates a huge
number of JCas objects (~1 million), one per page. I want to serialize
these JCas objects for later use, but generating 1 million different files
will take a toll on the system. So I was wondering if there was a way to
serialize multiple JCas objects to a single file for later retrieval. Any
idea if this can be achieved?
Thanks and Regards,
Samudra
--
*Samudra Banerjee*
First Year Graduate Student
Department of Computer Science
State University of New York
Stony Brook, NY 11790
631-496-6939