Hi

Whatever you do with the aggregator is that it holds the aggregated
data in memory, so you need to be able to store that in memory.
You could use a persistent repository, and use the claim check EIP
pattern and only store "tokens" to the actual data which is stored
offline (eg not in memory) for very large data.

That said, yesterday I worked on adding a group option to the splitter
+ tokenizer, so you can instruct it to split the data by N number of
tokens (eg lines, XML pairs etc.).
https://issues.apache.org/jira/browse/CAMEL-5236

I got the regular stuff implemented, but haven't got around the
tokenizerXML yet. As well I want to add an option so you can tell it
to include the parent tag as well.

eg
<orders>
  <order>
  ...
 </order>
  ...
  a zillion of <order>
</orders>

And then you can group that by 1000, so you get 1000 <order> in a <orders> tag




On Fri, Apr 27, 2012 at 5:25 PM, ebinsingh
<ebenezer.si...@verizonwireless.com> wrote:
> Hi All,
>
> I am trying to aggregate large number of xml files into files of 50000
> records.
> I am getting java.lang.OutOfMemoryError - Java heap space error.
>
> I am trying to see if there are any leaks but to my eyes i do not see any.
>
> Appreciate your thoughts on this.
>
> Aggreation logic:
>
> public class GlobalAggrStratergy implements AggregationStrategy {
>        private static Logger log = 
> Logger.getLogger(GlobalAggrStratergy.class);
>        int counter = 0;
>        @Override
>        public Exchange aggregate(Exchange exchange1, Exchange exchange2) {
>                try{
>                        StringBuilder builder;
>                if (exchange1 == null || null == exchange1.getIn().getBody()) {
>                        builder = new StringBuilder();
>                        exchange1 = new DefaultExchange(new 
> DefaultCamelContext());
>                    exchange1.getIn().setBody(builder);
>                }
>                builder = exchange1.getIn().getBody(StringBuilder.class);
>                builder.append(exchange2.getIn().getBody()+"\n");
>                exchange1.getIn().setBody(builder);
>                exchange1.getIn().setHeader(Exchange.FILE_NAME_ONLY,
> exchange2.getProperty(Exchange.FILE_NAME_ONLY));
>                counter++;
>                }catch(Exception ex){
>                        log.error("Error aggregating", ex);
>                }
>                exchange1.setProperty(Exchange.BATCH_SIZE, counter);
>                if(counter >= 50000)
>                        counter = 0;
>        return exchange1;
>        }
>
>
> Route configuration:
>
>        public void configure() throws Exception
>        {
>                from("direct:producerQueue").log("File name: 
> ${in.header.fileName}")
>                .setProperty(Exchange.FILE_NAME_ONLY, 
> simple("${file:onlyname.noext}"))
>                .split().tokenizeXML("IPDR").streaming()
>                .aggregate(header("messageId"), new
> GlobalAggrStratergy()).completionSize(50000).completionTimeout(20000)
>                .process(new IPDRHeaderFooterProcessor())
>                .to(IPDRUtil.getInstance().getProperty("IPDROutputDir"));
>        }
>
> Thanks & regards,
> Ebe
>
> --
> View this message in context: 
> http://camel.465427.n5.nabble.com/Java-heap-space-issue-with-Aggregation-tp5670608p5670608.html
> Sent from the Camel - Users mailing list archive at Nabble.com.



-- 
Claus Ibsen
-----------------
CamelOne 2012 Conference, May 15-16, 2012: http://camelone.com
FuseSource
Email: cib...@fusesource.com
Web: http://fusesource.com
Twitter: davsclaus, fusenews
Blog: http://davsclaus.blogspot.com/
Author of Camel in Action: http://www.manning.com/ibsen/

Reply via email to