Using Hypersonic, Hadoop or Mongo for such a use case is "over engineering" the requirement and will end up in much more complicated solution - IMO.
Best, Christian On Fri, Nov 9, 2012 at 6:57 PM, <[email protected]> wrote: > You may also want to check out Hadoop and map reduce > > > > http://camel.apache.org/hdfs.html > > > > with respect to point a and b. > > > > You can have an index on the record and the “reduce” job can serialize on > the index. > > > > *From:* Gonzalo Vasquez [mailto:[email protected]] > *Sent:* Friday, November 09, 2012 10:16 PM > *To:* [email protected] > *Subject:* Re: Camel performance tuning > > > > Thanks for your answer, my comments: > > > > a) a 5M file could be loaded into memory, but I have streaming enabled as > file size could be in the range of GB. Notwithstanding, I'll check what > Hypersonic & Mongo are, as I'm not aware of them. > > b) Parallel processing is set to false, because records must preserve > order on the output file > > c) Don't see the point here > > d) See a) > > e) what about async processing? There's no "long running process" here > > > > Thanks again.- > > > > *Gonzalo Vásquez Sáez* > > *Gerente Investigación y Desarrollo (R&D)* > *Altiuz* Soluciones Tecnológicas de Negocios Ltda. > Av. Nueva Tajamar 555 Of. 802, Las Condes > (56-2) 335 2461 > *[email protected] <[email protected]>l* > > *http://www.altiuz.cl* > > > > > > > > El 09-11-2012, a las 13:12, <[email protected]> escribió: > > > > I am really new to Camel but here are some options you can try > > > > a) Can you load the 5 MB file to memory before splitting it ? That > way IO will not be a problem. Probably put it in something like Hypersonic > or Mongo > > b) Why is parallel processing false ? Are the records related to > each other ? If true you can take advantage of multicore > > c) Is it possible to first split the files into chunks and then use > process the chunks independently ? > > d) Can you write into memory and flush at once ? > > e) Sync/Asynch : http://camel.apache.org/async.html > > > > *From:* Gonzalo Vasquez [mailto:[email protected]] > *Sent:* Friday, November 09, 2012 8:32 PM > *To:* [email protected] > *Subject:* Camel performance tuning > > > > I'm running a route that basically adds a character per line to a plain > text file, but it's taking to long, and it seems that it's due to some kind > of buffering issue when reading/writing from disk. > > > > I'm processing a 5MB file (attached as DC_FACCL132_0000 > MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL > template (also attached). > > > > It's taking for ever to process such a file, I understand I'm tokenizing > on line breaks, which could be the source of the problem as there are many > lines in the file (48198 exactly), but when running jvisualvm (see attached > images/snapshot)I can see the writing op is invoked 20386 times, which seem > not related to the line count. Is there an output buffer size that I can > configure? Or something like that? > > > > This is the route: > > <camel:route id="pager" autoStartup="true"> > > <camel:from > > uri=" > file:///tmp/in?charset=Windows-1252&move=${file:parent}/../paged/${file:name.noext}.paged.ack&preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext} > " /> > > <camel:split streaming="true" parallelProcessing="false"> > > <camel:tokenize token="\n" /> > > <camel:to uri="bean:pager" /> > > <camel:to > > uri=" > file:///tmp/paged?charset=utf8&fileName=${file:name.noext}.paged&fileExist=Append > " /> > > </camel:split> > > </camel:route> > > > > This is the referenced bean: > > > > <bean id="pager" class="cl.altiuz.reports.etl.TextProcessor"> > > <property name="xsltPath" > > value= > "/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl" > /> > > <property name="param" value="C.*PAG.* 1" /> > > </bean> > > > > Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think > isn't a platform dependent problem, but a configuration one. > > > > Any ideas? Any thing else that I should send? > > > > Thanks! > > > > *Gonzalo Vásquez Sáez* > > *Gerente Investigación y Desarrollo (R&D)* > *Altiuz* Soluciones Tecnológicas de Negocios Ltda. > Av. Nueva Tajamar 555 Of. 802, Las Condes > (56-2) 335 2461 > *[email protected] <[email protected]>l* > > *http://www.altiuz.cl* > > > > > > This e-mail and any files transmitted with it are for the sole use > of the intended recipient(s) and may contain confidential and privileged > information. If you are not the intended recipient(s), please reply to the > sender and destroy all copies of the original message. Any unauthorized > review, use, disclosure, dissemination, forwarding, printing or copying of > this email, and/or any action taken in reliance on the contents of this > e-mail is strictly prohibited and may be unlawful. > > > This e-mail and any files transmitted with it are for the sole use of > the intended recipient(s) and may contain confidential and privileged > information. If you are not the intended recipient(s), please reply to the > sender and destroy all copies of the original message. Any unauthorized > review, use, disclosure, dissemination, forwarding, printing or copying of > this email, and/or any action taken in reliance on the contents of this > e-mail is strictly prohibited and may be unlawful. > --
