Hi Whatever you do with the aggregator is that it holds the aggregated data in memory, so you need to be able to store that in memory. You could use a persistent repository, and use the claim check EIP pattern and only store "tokens" to the actual data which is stored offline (eg not in memory) for very large data.
That said, yesterday I worked on adding a group option to the splitter + tokenizer, so you can instruct it to split the data by N number of tokens (eg lines, XML pairs etc.). https://issues.apache.org/jira/browse/CAMEL-5236 I got the regular stuff implemented, but haven't got around the tokenizerXML yet. As well I want to add an option so you can tell it to include the parent tag as well. eg <orders> <order> ... </order> ... a zillion of <order> </orders> And then you can group that by 1000, so you get 1000 <order> in a <orders> tag On Fri, Apr 27, 2012 at 5:25 PM, ebinsingh <ebenezer.si...@verizonwireless.com> wrote: > Hi All, > > I am trying to aggregate large number of xml files into files of 50000 > records. > I am getting java.lang.OutOfMemoryError - Java heap space error. > > I am trying to see if there are any leaks but to my eyes i do not see any. > > Appreciate your thoughts on this. > > Aggreation logic: > > public class GlobalAggrStratergy implements AggregationStrategy { > private static Logger log = > Logger.getLogger(GlobalAggrStratergy.class); > int counter = 0; > @Override > public Exchange aggregate(Exchange exchange1, Exchange exchange2) { > try{ > StringBuilder builder; > if (exchange1 == null || null == exchange1.getIn().getBody()) { > builder = new StringBuilder(); > exchange1 = new DefaultExchange(new > DefaultCamelContext()); > exchange1.getIn().setBody(builder); > } > builder = exchange1.getIn().getBody(StringBuilder.class); > builder.append(exchange2.getIn().getBody()+"\n"); > exchange1.getIn().setBody(builder); > exchange1.getIn().setHeader(Exchange.FILE_NAME_ONLY, > exchange2.getProperty(Exchange.FILE_NAME_ONLY)); > counter++; > }catch(Exception ex){ > log.error("Error aggregating", ex); > } > exchange1.setProperty(Exchange.BATCH_SIZE, counter); > if(counter >= 50000) > counter = 0; > return exchange1; > } > > > Route configuration: > > public void configure() throws Exception > { > from("direct:producerQueue").log("File name: > ${in.header.fileName}") > .setProperty(Exchange.FILE_NAME_ONLY, > simple("${file:onlyname.noext}")) > .split().tokenizeXML("IPDR").streaming() > .aggregate(header("messageId"), new > GlobalAggrStratergy()).completionSize(50000).completionTimeout(20000) > .process(new IPDRHeaderFooterProcessor()) > .to(IPDRUtil.getInstance().getProperty("IPDROutputDir")); > } > > Thanks & regards, > Ebe > > -- > View this message in context: > http://camel.465427.n5.nabble.com/Java-heap-space-issue-with-Aggregation-tp5670608p5670608.html > Sent from the Camel - Users mailing list archive at Nabble.com. -- Claus Ibsen ----------------- CamelOne 2012 Conference, May 15-16, 2012: http://camelone.com FuseSource Email: cib...@fusesource.com Web: http://fusesource.com Twitter: davsclaus, fusenews Blog: http://davsclaus.blogspot.com/ Author of Camel in Action: http://www.manning.com/ibsen/