You may also want to check out Hadoop and map reduce
http://camel.apache.org/hdfs.html
with respect to point a and b.
You can have an index on the record and the “reduce” job can serialize on
the index.
*From:* Gonzalo Vasquez [mailto:[email protected]]
*Sent:* Friday, November 09, 2012 10:16 PM
*To:* [email protected]
*Subject:* Re: Camel performance tuning
Thanks for your answer, my comments:
a) a 5M file could be loaded into memory, but I have streaming enabled as
file size could be in the range of GB. Notwithstanding, I'll check what
Hypersonic & Mongo are, as I'm not aware of them.
b) Parallel processing is set to false, because records must preserve
order on the output file
c) Don't see the point here
d) See a)
e) what about async processing? There's no "long running process" here
Thanks again.-
*Gonzalo Vásquez Sáez*
*Gerente Investigación y Desarrollo (R&D)*
*Altiuz* Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
*[email protected] <[email protected]>l*
*http://www.altiuz.cl*
El 09-11-2012, a las 13:12, <[email protected]> escribió:
I am really new to Camel but here are some options you can try
a) Can you load the 5 MB file to memory before splitting it ? That
way IO will not be a problem. Probably put it in something like Hypersonic
or Mongo
b) Why is parallel processing false ? Are the records related to
each other ? If true you can take advantage of multicore
c) Is it possible to first split the files into chunks and then use
process the chunks independently ?
d) Can you write into memory and flush at once ?
e) Sync/Asynch : http://camel.apache.org/async.html
*From:* Gonzalo Vasquez [mailto:[email protected]]
*Sent:* Friday, November 09, 2012 8:32 PM
*To:* [email protected]
*Subject:* Camel performance tuning
I'm running a route that basically adds a character per line to a plain
text file, but it's taking to long, and it seems that it's due to some kind
of buffering issue when reading/writing from disk.
I'm processing a 5MB file (attached as DC_FACCL132_0000
MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL
template (also attached).
It's taking for ever to process such a file, I understand I'm tokenizing
on line breaks, which could be the source of the problem as there are many
lines in the file (48198 exactly), but when running jvisualvm (see attached
images/snapshot)I can see the writing op is invoked 20386 times, which seem
not related to the line count. Is there an output buffer size that I can
configure? Or something like that?
This is the route:
<camel:route id="pager" autoStartup="true">
<camel:from
uri="
file:///tmp/in?charset=Windows-1252&move=${file:parent}/../paged/${file:name.noext}.paged.ack&preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}
" />
<camel:split streaming="true" parallelProcessing="false">
<camel:tokenize token="\n" />
<camel:to uri="bean:pager" />
<camel:to
uri="
file:///tmp/paged?charset=utf8&fileName=${file:name.noext}.paged&fileExist=Append
" />
</camel:split>
</camel:route>
This is the referenced bean:
<bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
<property name="xsltPath"
value=
"/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"
/>
<property name="param" value="C.*PAG.* 1" />
</bean>
Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think
isn't a platform dependent problem, but a configuration one.
Any ideas? Any thing else that I should send?
Thanks!
*Gonzalo Vásquez Sáez*
*Gerente Investigación y Desarrollo (R&D)*
*Altiuz* Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
*[email protected] <[email protected]>l*
*http://www.altiuz.cl*
This e-mail and any files transmitted with it are for the sole use
of the intended recipient(s) and may contain confidential and privileged
information. If you are not the intended recipient(s), please reply to the
sender and destroy all copies of the original message. Any unauthorized
review, use, disclosure, dissemination, forwarding, printing or copying of
this email, and/or any action taken in reliance on the contents of this
e-mail is strictly prohibited and may be unlawful.
This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. If you are not the intended recipient(s), please reply to the
sender and destroy all copies of the original message. Any unauthorized
review, use, disclosure, dissemination, forwarding, printing or copying of
this email, and/or any action taken in reliance on the contents of this
e-mail is strictly prohibited and may be unlawful.