Scanning/Splitting Large input files

Kevin Jackson Mon, 25 Jan 2010 02:23:43 -0800

Hi,

I have a problem which I assume is a relatively normal use case.  I
have to process 700+Mb input text file (not xml) and I want to
generate events/messages from this file based on splitting the file
into individual records.


Having done some digging through the archives, it seems that there are
a couple of solutions:
# use the claim-check EIP
We do not want to use a database at this point in our processing and
using a file datastore we would run into 'too many files in one
directory' and if I start implementing a partition scheme, the code to
handle the splitting of the origin data becomes much more complex than
the data processing - this seems to be a hack upon hack approach and
one I wish to avoid.

# some kind of custom scanner
This post[1] seems to imply that it's possible to implement some kind
of custom splitting strategy based on a record delimiter - this is
what I would prefer to do.  Is there any further documentation on this
aspect of camel apart from the 'splitter' page[2] which seems to
assume processing the 'splitted' message in one pass where as I need
to generate not a List<MyMessage> but simply MyMessage

Having just looked at the src, the SplitterPojoTest is indeed
processing the entire message in one pass

Thanks,
Kev


[1] http://osdir.com/ml/users-camel-apache/2009-10/msg00289.html
[2] http://camel.apache.org/splitter.html

Scanning/Splitting Large input files

Reply via email to