Hi Keven

I think we have debated this before here on the Camel user forum so
you may be able to dig out some of those topics.

The JDK offers a java.util.Scanner which allows you to split a stream
on-the-fly. Camel leverages this scanner under the covers as well.

For example suppose you want to split a 700mb file pr line then you
can use the Camel splitter and have it tokenized using \n, which
should leverage that Scanner under the covers. You can also enable the
streaming mode of the Splitter which should prevent reading the 700mb
into memory.

So by enabling streaming and having the big message split by the
Scanner should allow you to do this with low memory usage.


Its the createIterator method on ObjectHelper which the Camel splitter
will use, if you use the body().tokenize("\n") as the split
expression.



On Mon, Jan 25, 2010 at 11:23 AM, Kevin Jackson <[email protected]> wrote:
> Hi,
>
> I have a problem which I assume is a relatively normal use case.  I
> have to process 700+Mb input text file (not xml) and I want to
> generate events/messages from this file based on splitting the file
> into individual records.
>
> Having done some digging through the archives, it seems that there are
> a couple of solutions:
> # use the claim-check EIP
> We do not want to use a database at this point in our processing and
> using a file datastore we would run into 'too many files in one
> directory' and if I start implementing a partition scheme, the code to
> handle the splitting of the origin data becomes much more complex than
> the data processing - this seems to be a hack upon hack approach and
> one I wish to avoid.
>
> # some kind of custom scanner
> This post[1] seems to imply that it's possible to implement some kind
> of custom splitting strategy based on a record delimiter - this is
> what I would prefer to do.  Is there any further documentation on this
> aspect of camel apart from the 'splitter' page[2] which seems to
> assume processing the 'splitted' message in one pass where as I need
> to generate not a List<MyMessage> but simply MyMessage
>
> Having just looked at the src, the SplitterPojoTest is indeed
> processing the entire message in one pass
>
> Thanks,
> Kev
>
>
> [1] http://osdir.com/ml/users-camel-apache/2009-10/msg00289.html
> [2] http://camel.apache.org/splitter.html
>



-- 
Claus Ibsen
Apache Camel Committer

Author of Camel in Action: http://www.manning.com/ibsen/
Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus

Reply via email to