Ok, I've included an aggregator in the splitter, as follows:
<camel:route id="pager" autoStartup="true">
<camel:from
uri="file:///tmp/in?charset=Windows-1252&move=${file:parent}/../paged/${file:name.noext}.paged.ack&preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}"
/>
<camel:log message="Iniciando paging" />
<camel:setHeader headerName="start">
<camel:simple>${date:now:mm}:${date:now:ss}.${date:now:SSS}</camel:simple>
</camel:setHeader>
<camel:split streaming="true"
parallelProcessing="false">
<camel:tokenize token="\n" />
<!-- <camel:log
message="${property.CamelSplitIndex}" /> -->
<camel:to uri="bean:pager" />
<camel:aggregate
strategyRef="aggregatorStrategy">
<camel:correlationExpression>
<camel:simple>${file:name}</camel:simple>
</camel:correlationExpression>
<camel:completionSize>
<camel:constant>250</camel:constant>
</camel:completionSize>
<camel:to
uri="file:///tmp/paged?charset=utf8&fileName=${file:name.noext}.paged&fileExist=Append"
/>
</camel:aggregate>
</camel:split>
<camel:log
message="Elapsed: ${header.start} -
${date:now:mm}:${date:now:ss}.${date:now:SSS}" />
</camel:route>
And the AggregationStrategy:
<bean id="aggregatorStrategy"
class="cl.altiuz.reports.etl.ConcatAggregationStrategy" />
I've also added some headers & logging to calculate elapsed time.
Pre-aggregator the elapsed time was about 30 seconds (for the 5MB test file),
and now is about half (15 secs), I can see clearly the improvement, but not as
much as expected.
Any extra tips? I''ve included the custom AggregationStrategy I had to create,
as all I needed was appending/concatenating body contents.
Gonzalo Vásquez Sáez
Gerente Investigación y Desarrollo (R&D)
Altiuz Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
[email protected]
http://www.altiuz.cl
El 09-11-2012, a las 15:09, Christian Müller <[email protected]>
escribió:
> Using Hypersonic, Hadoop or Mongo for such a use case is "over engineering"
> the requirement and will end up in much more complicated solution - IMO.
>
> Best,
> Christian
>
> On Fri, Nov 9, 2012 at 6:57 PM, <[email protected]> wrote:
>
>> You may also want to check out Hadoop and map reduce
>>
>>
>>
>> http://camel.apache.org/hdfs.html
>>
>>
>>
>> with respect to point a and b.
>>
>>
>>
>> You can have an index on the record and the “reduce” job can serialize on
>> the index.
>>
>>
>>
>> *From:* Gonzalo Vasquez [mailto:[email protected]]
>> *Sent:* Friday, November 09, 2012 10:16 PM
>> *To:* [email protected]
>> *Subject:* Re: Camel performance tuning
>>
>>
>>
>> Thanks for your answer, my comments:
>>
>>
>>
>> a) a 5M file could be loaded into memory, but I have streaming enabled as
>> file size could be in the range of GB. Notwithstanding, I'll check what
>> Hypersonic & Mongo are, as I'm not aware of them.
>>
>> b) Parallel processing is set to false, because records must preserve
>> order on the output file
>>
>> c) Don't see the point here
>>
>> d) See a)
>>
>> e) what about async processing? There's no "long running process" here
>>
>>
>>
>> Thanks again.-
>>
>>
>>
>> *Gonzalo Vásquez Sáez*
>>
>> *Gerente Investigación y Desarrollo (R&D)*
>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>> (56-2) 335 2461
>> *[email protected] <[email protected]>l*
>>
>> *http://www.altiuz.cl*
>>
>>
>>
>>
>>
>>
>>
>> El 09-11-2012, a las 13:12, <[email protected]> escribió:
>>
>>
>>
>> I am really new to Camel but here are some options you can try
>>
>>
>>
>> a) Can you load the 5 MB file to memory before splitting it ? That
>> way IO will not be a problem. Probably put it in something like Hypersonic
>> or Mongo
>>
>> b) Why is parallel processing false ? Are the records related to
>> each other ? If true you can take advantage of multicore
>>
>> c) Is it possible to first split the files into chunks and then use
>> process the chunks independently ?
>>
>> d) Can you write into memory and flush at once ?
>>
>> e) Sync/Asynch : http://camel.apache.org/async.html
>>
>>
>>
>> *From:* Gonzalo Vasquez [mailto:[email protected]]
>> *Sent:* Friday, November 09, 2012 8:32 PM
>> *To:* [email protected]
>> *Subject:* Camel performance tuning
>>
>>
>>
>> I'm running a route that basically adds a character per line to a plain
>> text file, but it's taking to long, and it seems that it's due to some kind
>> of buffering issue when reading/writing from disk.
>>
>>
>>
>> I'm processing a 5MB file (attached as DC_FACCL132_0000
>> MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL
>> template (also attached).
>>
>>
>>
>> It's taking for ever to process such a file, I understand I'm tokenizing
>> on line breaks, which could be the source of the problem as there are many
>> lines in the file (48198 exactly), but when running jvisualvm (see attached
>> images/snapshot)I can see the writing op is invoked 20386 times, which seem
>> not related to the line count. Is there an output buffer size that I can
>> configure? Or something like that?
>>
>>
>>
>> This is the route:
>>
>> <camel:route id="pager" autoStartup="true">
>>
>> <camel:from
>>
>> uri="
>> file:///tmp/in?charset=Windows-1252&move=${file:parent}/../paged/${file:name.noext}.paged.ack&preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}
>> " />
>>
>> <camel:split streaming="true" parallelProcessing="false">
>>
>> <camel:tokenize token="\n" />
>>
>> <camel:to uri="bean:pager" />
>>
>> <camel:to
>>
>> uri="
>> file:///tmp/paged?charset=utf8&fileName=${file:name.noext}.paged&fileExist=Append
>> " />
>>
>> </camel:split>
>>
>> </camel:route>
>>
>>
>>
>> This is the referenced bean:
>>
>>
>>
>> <bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
>>
>> <property name="xsltPath"
>>
>> value=
>> "/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"
>> />
>>
>> <property name="param" value="C.*PAG.* 1" />
>>
>> </bean>
>>
>>
>>
>> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think
>> isn't a platform dependent problem, but a configuration one.
>>
>>
>>
>> Any ideas? Any thing else that I should send?
>>
>>
>>
>> Thanks!
>>
>>
>>
>> *Gonzalo Vásquez Sáez*
>>
>> *Gerente Investigación y Desarrollo (R&D)*
>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>> (56-2) 335 2461
>> *[email protected] <[email protected]>l*
>>
>> *http://www.altiuz.cl*
>>
>>
>>
>>
>>
>> This e-mail and any files transmitted with it are for the sole use
>> of the intended recipient(s) and may contain confidential and privileged
>> information. If you are not the intended recipient(s), please reply to the
>> sender and destroy all copies of the original message. Any unauthorized
>> review, use, disclosure, dissemination, forwarding, printing or copying of
>> this email, and/or any action taken in reliance on the contents of this
>> e-mail is strictly prohibited and may be unlawful.
>>
>>
>> This e-mail and any files transmitted with it are for the sole use of
>> the intended recipient(s) and may contain confidential and privileged
>> information. If you are not the intended recipient(s), please reply to the
>> sender and destroy all copies of the original message. Any unauthorized
>> review, use, disclosure, dissemination, forwarding, printing or copying of
>> this email, and/or any action taken in reliance on the contents of this
>> e-mail is strictly prohibited and may be unlawful.
>>
>
>
>
> --