Re: Large file processing with Apache Camel

2013-02-28 Thread Christian Müller
You have to make sure your processor is thread safe, because it is shared between all parallel executed threads. Best, Christian On Thu, Feb 28, 2013 at 1:03 PM, cristisor wrote: > After digging more into my problem I found that the slow db access was the > main issue, maybe you heard before of

Re: Large file processing with Apache Camel

2013-02-28 Thread Claus Ibsen
On Thu, Feb 28, 2013 at 1:03 PM, cristisor wrote: > After digging more into my problem I found that the slow db access was the > main issue, maybe you heard before of setting > sendStringParametersAsUnicode=false in the jdbc driver to dramatically > increase the performance. > > Since the last tim

Re: Large file processing with Apache Camel

2013-02-28 Thread cristisor
After digging more into my problem I found that the slow db access was the main issue, maybe you heard before of setting sendStringParametersAsUnicode=false in the jdbc driver to dramatically increase the performance. Since the last time I posted here I learned a lot about apache camel and I imple

Re: Large file processing with Apache Camel

2013-02-22 Thread Hadrian Zbarcea
Cristisor, Take a look at the demo I mentioned. Run it (as easy as `mvn install`), look at the logs to see the tread allocation. That should answer your #1. For #2, best is to try both and measure. For #3, take a look at the recipient list pattern [1]. I hope this helps, Hadrian [1] http://

Re: Large file processing with Apache Camel

2013-02-22 Thread cristisor
Thank you everybody for your help. After reading and trying, I found how to implement splitter and aggregators and I managed to achieve my scope. Here is the new status: 1. read 500 lines from the file and send them in an exchange to the next service unit, to a certain endpoint according to the fi

Re: Large file processing with Apache Camel

2013-02-22 Thread Hadrian Zbarcea
Last year at ApacheCon, I showed a demo [1] related to processing a file in parallel in multiple threads (in 'splits' - term borrowed from hdfs - of a configurable size). I used a relatively small csv file for my demo, not xml, but it works exactly the same with xml. Take a look at it, I believ

Re: Large file processing with Apache Camel

2013-02-22 Thread Claus Ibsen
On Fri, Feb 22, 2013 at 5:35 PM, Claus Ibsen wrote: > Hi > > Have you seen the splitter with group N lines together section at > http://camel.apache.org/splitter.html > Ah yeah you use an older Camel release. You can implement a custom expression that does what this functionality in Camel 2.10 of

Re: Large file processing with Apache Camel

2013-02-22 Thread Claus Ibsen
Hi Have you seen the splitter with group N lines together section at http://camel.apache.org/splitter.html On Thu, Feb 21, 2013 at 10:10 PM, cristisor wrote: > Hello everybody, > > I'm using Apache Fuse ESB with Apache Camel 2.4.0 (I think) to process some > large files. Until now a service uni

Re: Large file processing with Apache Camel

2013-02-22 Thread Maruan Sahyoun
ok. For the StringBuilder you might be able to avoid that. As you have a ByteArrayOutputStream already you can either get the exchange body as a byte array, an input stream an output stream … (http://camel.apache.org/type-converter.html) and write/append the bytes directly, copy the byte arrays

Re: Large file processing with Apache Camel

2013-02-22 Thread Maruan Sahyoun
ok. For the StringBuilder you might be able to avoid that. As you have a ByteArrayOutputStream already you can either get the exchange body as a byte array, an input stream an output stream … (http://camel.apache.org/type-converter.html) and write/append the bytes directly, copy the byte arrays

Re: Large file processing with Apache Camel

2013-02-22 Thread cristisor
I will try to provide the steps that are in the current version: 1. read one line from the file, set it as the outbound message's body of an exchange, and, according to the file type, send the exchange to an activemq queue 2. the exchange will arrive on another service unit that has a processor whi

Re: Large file processing with Apache Camel

2013-02-21 Thread Maruan Sahyoun
could you elaborate a little bit on the xml mapper? how do you get from data lines to xml? E.g. instead of generating strings for concatenation you could create StAX or SAX events and stream to your output. Why do you need to append the xml as strings? could you append to a file, write to a str

Re: Large file processing with Apache Camel

2013-02-21 Thread Maruan Sahyoun
well - let's assume for a moment that reading in the file is not an issue then you could aggregate to a larger message and send that across. Why you get an OOME when creating the xml can't be seen from the information provided. if you want to keep a number of lines together you can use .split(

Re: Large file processing with Apache Camel

2013-02-21 Thread Willem jiang
Hi, You can take a look at the VM component[1] or NMR component[2] to leverage the multiple thread to deal the XML processing. [1]http://camel.apache.org/vm.html [2]http://camel.apache.org/nmr.html -- Willem Jiang Red Hat, Inc. FuseSource is now part of Red Hat Web: http://www.fusesource.com

Re: Large file processing with Apache Camel

2013-02-21 Thread Maruan Sahyoun
Hi if you suspects it's file io what about testing reading and splitting the file only wo further processing. Should tell you if the time is taken reading and splitting the file or in XML processing at stage 1 and stage 2. If it's io where is the file located. Is it a local file, network share

Re: Large file processing with Apache Camel

2013-02-21 Thread cristisor
Many thanks for your reply. When I read from the file I read simple lines, not XML. It takes more than one hour to read, process and insert into the db 20.000 lines so I took out the service unit that does the db operations and I was left with 26 minutes for reading from the file line by line, con

Re: Large file processing with Apache Camel

2013-02-21 Thread Willem jiang
I just want to ask some question about your performance enhancement. First, what made you think that reading multiple lines of XML will improve the performance? I just read about the route you showed, you just send the exchange into a queue after reading a line of the file. I don't think reading