Hi,

> from("file:../../../data/
> clients?noop=true").convertBodyTo(Client.class).to("jpa:au.com.interlated.server.domain.Client");
>
> I uploaded 300k records - 90mb in total. The java image said it had 2.7Gb of
> RAM allocated before it bombed out due to heap space.

I had a similar problem, I resolved it using
<from uri="file://">
<split streaming="true"> (and some tokenize string - typically '\n')
  <to>
</split>

Essentially without the split + streaming, the entire file is retained
in memory and any objects created from that file have a strong
reference to the input data and cannot be gc'd.  By streaming the file
(I guess each line of input represents a 'record'), you only retain
one 'record' in ram at a time and after the object has been created
and persisted, then it becomes eligible for gc as the underlying
bytestream it was attached to no longer has any references.

Another advantage of this approach is that downstream processing can
take place before the entire file has been read.  This is how SAX
works (as opposed to DOM which must read the entire file before being
useful).

The caveat to this is that each record in the input must be self
contained so that it is sensible to split the file on a record
boundary, if the file has a header section which contains lookup data
for each record, then all the records, then a simple split won't work
and you would have to pre-process the file to get the lookup data then
post process (splitting and streaming) ignoring the header records
(you can tell this is fresh in my mind as I have had exactly the same
problems)

Thanks,
Kev

Reply via email to