Don't you have a hadoop cluster you can use? Hadoop would handle the
file splitting for you, and if your UIMA analysis is well-behaved, you
can deploy it as a M/R job, one record at a time.
--Thilo
On 10/18/2013 12:25 PM, [email protected] wrote:
Hi Jens,
It's a log file.
Cheers,
Armin
-----Ursprüngliche Nachricht-----
Von: Jens Grivolla [mailto:[email protected]]
Gesendet: Freitag, 18. Oktober 2013 11:05
An: [email protected]
Betreff: Re: Working with very large text documents
On 10/18/2013 10:06 AM, Armin Wegner wrote:
What are you doing with very large text documents in an UIMA Pipeline, for
example 9 GB in size.
Just out of curiosity, how can you possibly have 9GB of text that represent one
document? From a quick look at project gutenberg it seems that a full book with
HTML markup is about 500kB to 1MB, so that's about a complete public library
full of books.
Bye,
Jens